install.packages("remotes") # if not yet installed
::install_github("matt-dray/chunktop") remotes
tl;dr
I developed a convenience function for parsing chunk options out of an R Markdown document for collaborators to edit. But it doesn’t work in reverse, which is the whole point. Some notes on failure.
Note
Since this was posted, the folks at ThinkR have a pretty good solution with the {lightparser} package: ‘From Rmd And Qmd To Tibble And Back’. The ‘And Back’ being the crucial part.
A very simple problem (?)
John asked about parsing chunk options from R Markdown files (Rmd), passing the content to editors, then reincorporating the edited text back into the chunk whence it came.
Well, John, have I got the solution for you. Oh wait, I absolutely don’t. Rather I’ve looked into {parsermd}, messed around and… gave up. C’est la vie.
In my hubris, I began the package {chunktop} to solve John’s problem. Goal: parse Rmd chunk options, write them to file, allow for edits, read them back in and reintegrate the strings into the chunks they came from. Then re-write the Rmd with the changes. Boom.
Spoiler: I was naïve.
A good start
My advice: don’t install {chunktop}; it’s already archived on GitHub.
It steals functionality from the excellent {parsermd} package to grab Rmd chunk options using the very appropriate rmd_get_options()
function.
The package contains a demo Rmd for testing purposes.
<- system.file("extdata/demo1.Rmd", package = "chunktop") path
You can read in this demo Rmd with parsermd::parse_rmd()
, which is actually a little abstract syntax tree (AST) that demonstrates the hierarchy of the page and contains all the data.
if (!require("parsermd")) install.packages(parsermd)
Loading required package: parsermd
<- parse_rmd(path)) (rmd
├── YAML [1 lines]
└── Heading [h1] - A header
├── Chunk [r, 1 opt, 1 lines] - chunk1
└── Heading [h2] - A subheader
├── Chunk [r, 2 opts, 1 lines] - chunk2
├── Markdown [2 lines]
├── Chunk [r, 1 lines] - chunk3
└── Chunk [r, 2 opts, 1 lines] - chunk4
And we can take a look at the lines in the Rmd doc.
<- parsermd::as_document(rmd)) (rmd_doc
[1] "---"
[2] "title: Test"
[3] "---"
[4] ""
[5] "# A header"
[6] ""
[7] "```{r chunk1, eval = FALSE}"
[8] "1 + 1"
[9] "```"
[10] ""
[11] "## A subheader"
[12] ""
[13] "```{r chunk2, fig.cap = \"I am a fig caption.\", fig.height = 4}"
[14] "plot(mtcars$mpg, mtcars$cyl)"
[15] "```"
[16] ""
[17] "Some text."
[18] ""
[19] ""
[20] "```{r chunk3}"
[21] "plot(mtcars$mpg, mtcars$disp"
[22] "```"
[23] ""
[24] "```{r chunk4, fig.cap = \"I am another fig caption.\", eval = TRUE}"
[25] "plot(mtcars$mpg, mtcars$drat)"
[26] "```"
[27] ""
So it contains a bunch of chunks with various options.
So, here’s as far as I got. First off, given our Rmd file, we can use get_chunktop()
to read the chunk options of interest using {parsermd} and stick them a nice little list. Trivially.
library(chunktop)
<- get_chunktop(rmd_file = path, opts = c("fig.cap", "eval"))) (chunktop_list
$chunk1
$chunk1$eval
[1] "FALSE"
$chunk2
$chunk2$fig.cap
[1] "\"I am a fig caption.\""
$chunk4
$chunk4$fig.cap
[1] "\"I am another fig caption.\""
$chunk4$eval
[1] "TRUE"
Which is trivially massaged into a data.frame.
<- chunktop_to_df(chunktop_list)) (chunktop_df
chunk_name option_name option_value
1 chunk1 eval FALSE
2 chunk2 fig.cap "I am a fig caption."
3 chunk4 fig.cap "I am another fig caption."
4 chunk4 eval TRUE
Which is trivially written to a CSV to share with your colleagues for editing.
<- tempfile(fileext = ".csv")
csv_file write.csv(chunktop_df, csv_file, row.names = FALSE)
Which is trivially read back into R.
<- read.csv(csv_file)) (chunktop_df2
chunk_name option_name option_value
1 chunk1 eval FALSE
2 chunk2 fig.cap "I am a fig caption."
3 chunk4 fig.cap "I am another fig caption."
4 chunk4 eval TRUE
Which is trivially converted back into a list.
<- df_to_chunktop(chunktop_df2)) (chunktop_list2
$chunk1
$chunk1$eval
[1] "FALSE"
$chunk2
$chunk2$fig.cap
[1] "\"I am a fig caption.\""
$chunk4
$chunk4$fig.cap
[1] "\"I am another fig caption.\""
$chunk4$eval
[1] "TRUE"
And then we trivially, um, well, I don’t know, actually. I just assumed there would be an elegant way to convert the listed options back into the chunk options.
The function parsermd::rmd_set_options()
exists, but appears to set options for all chunks, not selected options. Maybe I misunderstood, let me know.
For sure you could do a thing where you parse the strings out of the Rmd and replace them with the new option values. You are welcome to do that, friend.
Descent into masochism
Anyway, having implied that I am too lazy to do that, I actually tried something much more procastinatory when I hit a brick wall. Enjoy this bonus content.
I wondered if I could just… extract the chunk options from the original Rmd myself, dependency-free. For ‘fun’.
But I turned I into a sort of unhinged code golf. Well, maybe ‘code wild-goose-chase’. The challenge: is it possible to extract chunk options from an arbitrary Rmd using only base R in a single base-R pipe (|>
) chain? Haha, of course.
First, I’ve got a few input checks that I’ve put into a .check_inputs()
so that it causes less clutter in the main function.
Click for function to check inputs
<- function(rmd_lines, engine, yaml_out) {
.check_inputs
if (!inherits(rmd_lines, "character")) {
stop("Argument 'rmd_lines' must be a character vector.", call. = FALSE)
}
if (engine != "r") {
stop("For now, argument 'engine' must be 'r'.", call. = FALSE)
}
if (!is.null(yaml_out) & !inherits(yaml_out, "character")) {
stop("Argument 'yaml_out' must be NULL or character.", call. = FALSE)
}
if (!is.null(yaml_out)) {
if (tools::file_ext(yaml_out) != "yaml") {
stop("Argument 'yaml_out' must have extension '.yaml'.", call. = FALSE)
}
if (!dir.exists(dirname(yaml_out))) {
stop("The directory in 'yaml_out' does not exist.", call. = FALSE)
}
}
}
So here’s an overengineered grab_chunktop()
function that extracts Rmd chunk options.
<- function(rmd_lines, engine = "r", yaml_out = NULL) {
grab_chunktop
.check_inputs(rmd_lines, engine, yaml_out)
grep("```\\{", rmd_lines)] |> # isolate Rmd chunks
rmd_lines[gsub(paste0("```\\{", engine , " |\\}$"), "", x = _) |> # retain options
strsplit(", ") |> # list of chunks, split by commas
(\(chunk_str) {setNames(object = chunk_str, sapply(chunk_str, "[[", 1)) # name-value
|> # name the list elements after chunks
})() sapply("[", -1) |> # forget first element (engine)
lapply( # split options/values into nested list under named chunk
\(opt_str) {strsplit(opt_str, "=") |> # "eval=TRUE" to "eval" and "TRUE"
# first element to name
(\(opt_str) { setNames(object = opt_str, trimws(sapply(opt_str, "[[", 1)))}
|>
)() sapply("[", -1) |> # now remove first element
trimws() |>
as.list() |>
type.convert(as.is = TRUE) # make sure "1" becomes 1
}
)
}
Absolutely disgusting. But it works!
grab_chunktop(rmd_doc)
$chunk1
$chunk1$eval
[1] FALSE
$chunk2
$chunk2$fig.cap
[1] "\"I am a fig caption.\""
$chunk2$fig.height
[1] 4
$chunk3
list()
$chunk4
$chunk4$fig.cap
[1] "\"I am another fig caption.\""
$chunk4$eval
[1] TRUE
Note the exclusive use of |>
base pipe, _
placeholder, \()
anonymous function and ()()
‘dog’s balls’. Welcome to base R circa 2023.
I’m telling you this works and you can see that’s the case. But gosh bless you if you actually try to grok that code by eye.
Brief grief
Nothing good has come of this. I just have more respect for {parsermd}. But here’s a brief timeline of my activites on this task:
- there’s no way this isn’t trivial (denial)
- I can’t believe I can’t batter {parsermd} into doing this (anger)
- I’ll start a package and see where we get (bargaining)
- what if I simply ignore the package and lark about with base R instead (depression)
- I’ll just write a blog post and move on (acceptance)
Don’t worry, you only lost a few braincells reading this. I lost an afternoon, woe is me.
Environment
Session info
Last rendered: 2024-01-13 09:55:00 GMT
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] chunktop_0.0.0.9000 parsermd_0.1.2
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.2 knitr_1.45 rlang_1.1.3
[5] xfun_0.41 purrr_1.0.2 jsonlite_1.8.7 glue_1.7.0
[9] backports_1.4.1 htmltools_0.5.6.1 hms_1.1.3 fansi_1.0.6
[13] rmarkdown_2.25 evaluate_0.23 tibble_3.2.1 tzdb_0.4.0
[17] fontawesome_0.5.2 fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4
[21] compiler_4.3.1 htmlwidgets_1.6.2 Rcpp_1.0.11 pkgconfig_2.0.3
[25] rstudioapi_0.15.0 digest_0.6.33 R6_2.5.1 tidyselect_1.2.0
[29] readr_2.1.4 utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3
[33] checkmate_2.2.0 withr_2.5.2 tools_4.3.1