简体   繁体   中英

How to make code chunks depend on all previous chunks in knitr/rmarkdown?

Goal

I want to create my data analysis reproducible by making chunks depend on all previous chunks. So, if there are 3 chunks and I change something in the 1st chunk the subsequent 2 chunks should re-run so that they reflect the change made in the outputs. I want to add this condition in the global chunk options at the top of the document so that I don't have to use dependson multiple times.

Problems

The outputs of a chunk don't change if it is not modified and cache=TRUE . For the chunks containing the code, I can make them dependable on all previous ones using following at the top of the document:

```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```

If any of the above chunks is changed, all subsequent chunks are re-run. But this does not work if I use source() in chunks to read R scripts. Following is an example document:

---
title: "Untitled"
output: html_document
---
```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```


# Create Data
```{r}
#source("data1.R")
x <- data.frame(col1 = 4:10, col2 = 6:12)
x
```

# Summaries
```{r}
#source("data2.R")

median1.of.x <- sapply(x, function(x) median(x)-1)

sd.of.x <- sapply(x, sd)

plus.of.x <- sapply(x, function(x) mean(x)+1)

jj <- rbind(plus.of.x, sd.of.x, median1.of.x)

```

```{r}
jj
```

Now, if I change any of the 1st 2 chunks the third chunk gives correct output after knit ting. But if instead I put the first chunk's code in a source file data1.R and second chunk's in file data2.R , keeping the global chunk options same as before, if I make any changes in source files they are not reflected in the output of third chunk correctly. For example, changing x to x <- data.frame(col1 = 5:11, col2 = 6:12) should yield:

 > jj
                 col1      col2
plus.of.x    9.000000 10.000000
sd.of.x      2.160247  2.160247
median1.of.x 8.000000  9.000000 

But with use of source() as discussed above, the knitr document reports:

 jj
##                col1      col2
## mean.of.x  5.000000  9.000000
## sd.of.x    2.160247  2.160247
## minus.of.x 6.000000 10.000000 

What settings do I need to change to use source in knitr docs correctly?

When you use source() , knitr is unable to analyze the possible objects to be created from it; knitr must be able to see the full source code to analyze the dependencies among code chunks. There are two approaches to solve your problem:

  1. Tell the second chunk that it depends on the value of x by adding an arbitrary chunk option that uses the value of x , eg ```{r cache.extra = x} ; then whenever x changes, the cache of this code chunk will be automatically invalidated ( more info );
  2. Let knitr see the full source code; you can pass the source code to a code chunk via the chunk option code , eg ```{r code = readLines('data1.R')} (same for data2.R ); then dep_auto() should be able to figure out x was created from the first chunk, and used in the second chunk, so the second chunk must depend on the first chunk.

I found that this works (knitr 1.17):

<<..., dependson=all_labels()>>=
...
@

I think, by default, chunks do depend on previous chunks, and the author went to great lengths to try to make each chunk start with the same environment that the last one ended (although there are numerous ways of screwing this up, like sourcing files with caching turned on...) I can't recall the syntax, but you can include knitr chunks in external documents. There is also a trick to reuse knitr chunks in the same doc in a function-like manner by reusing the label, and you may be able to build some non linear dependency from this. But why not set cache to FALSE when you don't want caching? Sourcing seems like a bad idea but I can't put my finger on why. I would make the knitr workflow linear and put logic in functions, and turn off caching if the same function call can return different things with the same input parameters.

Another trick that might be useful to you is the recently added ability to knit a document using input parameters. This could possibly extract some logic from your knitr doc, which I think is the avoidable root of your problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM