tl;dr
Three tips for reproducibility in R: centralise everything; report with code; manage your workflows.
Reproducevangelism
I spoke at the Department for Education’s Data Science Week. I wanted everyone – newer and more experienced users alike – to learn at least one new thing about reproduciblity with R and RStudio.
The slides are embedded below and you can also get them fullscreen online (press ‘F’ for fullscreen and ‘P’ for presenter notes) and find the source on GitHub.
Three things
The three things to achieve reproducibility were very broad. I focused on R and some specific packages that could be helpful, but the ideas are transferable and there’s lots of ways to achieve the same thing.
The things were:
1. Centralise everything
Get code, functions, data, documentation in one place. Use R Projects in RStudio and write packages. This makes code more shareable and improves the chance that others can recreate things on their machine.
2. Report with code
Put code inside your report so that updates to data and code will be reflected instantly. Use R Markdown and other formats like Yihui Xie’s {xaringan} for reproducible slides and {bookdown} by for reproducible books.
3. Manage workflows
Don’t use your brain to store information about the dependencies within your analysis. Use {drake} by Will Landau instead. It remembers all the relationships between the files, objects and fcuntions in your analysis and only re-runs what needs to be re-run following changes.
Acknowledgements
I keep referring to the same resources about reproducibility. Take a look at:
- Reproducible Analytical Pipelines (RAP), a UK government initiative to make publications more reproducible
- The Turing Way, a book about reproducibility from the Alan Turing Institute
- Putting the R into reproducible research, some excellent and comprehensive slides by Anna Krystalli
On this blog
Relevant rostrum.blog reproduciblity-related writings:
Environment
Session info
Last rendered: 2023-07-22 16:29:14 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.1 fastmap_1.1.1
[4] cli_3.6.1 tools_4.3.1 htmltools_0.5.5
[7] xaringanExtra_0.7.0 rstudioapi_0.15.0 yaml_2.3.7
[10] rmarkdown_2.23 knitr_1.43.1 jsonlite_1.8.7
[13] xfun_0.39 digest_0.6.33 rlang_1.1.1
[16] evaluate_0.21