Hooray for Yihui
Yihui Xie is an R legend. He was, however, recently laid off by his employers at Posit.
I’ve personally benefited a great deal from Yihui’s work, from writing reproducible presentations with {xaringan} to producing the original version of this blog with {blogdown}.
At a grander scale, Yihui’s contributions to the R ecosystem have had a lasting and transformational impact on how we generate Official Statistics in the UK, where R Markdown and {knitr} are essential and ubiquitous tools in particular.
So much so that we have a custom Yihui Slack emoji.
That’s a RAP
Put (far too) simply, a Reproducible Analytical Pipeline (RAP) is any code-driven, version-controlled workflow that reads data, processes it and creates consumable outputs, while ensuring that the process can be re-run in the future and by others.
RAP was birthed from ‘DataOps’ principles with a focus on the production of statistical publications: reports and data files for public consumption, published officially on the UK government’s website. These files are important for transparency and decision making.
These days, RAP is so much more: it’s a way of thinking, a community and a movement. Its ethos has spread across the UK public sector and is gaining traction globally through efforts like Bruno Rodrigues’s excellent book.
R is for RAP
RAP is language agnostic, but R has emerged as the preferred option for statistical production in the UK’s government and public sector. Why? Possibly because R is a data- and stats-first language and therefore a natural choice for statistics professionals.
Of course, R can easily cover the whole ‘soup-to-nuts’ workflow. Not just ingestion and digestion of data, but also crucially the creation of reports. R Markdown and {knitr} are the obvious tool for this kind of document generation, for which we must thank Yihui for his tireless and humble efforts.
But what makes R Markdown so conducive to RAP, in particular? Well, stats publications are generally periodical (often weekly) and R Markdown is perfect for literate programming at pace: you can create a skeleton document that can be updated dynamically with R code, saving so much time when a new version of the publication needs to be created with fresh data.
Crucially, R Markdown is relatively simple to learn and use. You write some plain text and mark it up with simple adornments. This suits perfectly the range of skills and abilities in statistical teams across the public sector, where staff are often ‘numbers-people’ first and ‘coders’ second.
Hence why R Markdown has been a central tenet of RAP since Dr Matt Upson, RAP’s ‘Founding Father’, noted it in his germinal blog post.
Down, but not out
Of course, I’m not alone: many others have talked about their appreciation for Yihui and his work, including Eric and Mike’s discussion on the R Weekly podcast and Emily’s thread.
You can also take a look at the incredible number of people who have signed up to sponsor Yihui on GitHub, which sits just shy of 300 at the time of writing.
Thank you, Yihui. We look forward to what comes next.
Environment
Session info
Last rendered: 2024-01-22 17:48:46 GMT
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.1 fastmap_1.1.1 cli_3.6.2
[5] tools_4.3.1 htmltools_0.5.6.1 rstudioapi_0.15.0 yaml_2.3.8
[9] rmarkdown_2.25 knitr_1.45 jsonlite_1.8.7 xfun_0.41
[13] digest_0.6.33 rlang_1.1.3 evaluate_0.23