简体   繁体   English

如何使代码块依赖于knitr / rmarkdown中以前的所有块?

[英]How to make code chunks depend on all previous chunks in knitr/rmarkdown?

Goal 目标

I want to create my data analysis reproducible by making chunks depend on all previous chunks. 我想通过使块依赖于所有以前的块来创建可重现的数据分析。 So, if there are 3 chunks and I change something in the 1st chunk the subsequent 2 chunks should re-run so that they reflect the change made in the outputs. 因此,如果有3个块并且我在第一个块中更改某些内容,则后续的2个块应该重新运行,以便它们反映输出中所做的更改。 I want to add this condition in the global chunk options at the top of the document so that I don't have to use dependson multiple times. 我想在文档顶部的全局块选项中添加此条件,以便我不必多次使用dependson

Problems 问题

The outputs of a chunk don't change if it is not modified and cache=TRUE . 如果未修改块并且cache=TRUE ,则块的输出不会更改。 For the chunks containing the code, I can make them dependable on all previous ones using following at the top of the document: 对于包含代码的块,我可以使用文档顶部的以下内容使它们可靠地依赖于所有以前的代码:

```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```

If any of the above chunks is changed, all subsequent chunks are re-run. 如果更改了上述任何块,则重新运行所有后续块。 But this does not work if I use source() in chunks to read R scripts. 但是,如果我在块中使用source()来读取R脚本,则这不起作用。 Following is an example document: 以下是一个示例文档:

---
title: "Untitled"
output: html_document
---
```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```


# Create Data
```{r}
#source("data1.R")
x <- data.frame(col1 = 4:10, col2 = 6:12)
x
```

# Summaries
```{r}
#source("data2.R")

median1.of.x <- sapply(x, function(x) median(x)-1)

sd.of.x <- sapply(x, sd)

plus.of.x <- sapply(x, function(x) mean(x)+1)

jj <- rbind(plus.of.x, sd.of.x, median1.of.x)

```

```{r}
jj
```

Now, if I change any of the 1st 2 chunks the third chunk gives correct output after knit ting. 现在,如果我更改了前2个块中的任何一个,则第三个块在knit后会提供正确的输出。 But if instead I put the first chunk's code in a source file data1.R and second chunk's in file data2.R , keeping the global chunk options same as before, if I make any changes in source files they are not reflected in the output of third chunk correctly. 但是,如果不是我把第一块的代码在源文件中data1.R和第二块的文件data2.R ,保持全球块选项和以前一样,如果我在源文件中的任何改变,他们不会反映在输出第三块正确。 For example, changing x to x <- data.frame(col1 = 5:11, col2 = 6:12) should yield: 例如,将x更改为x <- data.frame(col1 = 5:11, col2 = 6:12)应该产生:

 > jj
                 col1      col2
plus.of.x    9.000000 10.000000
sd.of.x      2.160247  2.160247
median1.of.x 8.000000  9.000000 

But with use of source() as discussed above, the knitr document reports: 但是如上所述使用source()knitr文档报告:

 jj
##                col1      col2
## mean.of.x  5.000000  9.000000
## sd.of.x    2.160247  2.160247
## minus.of.x 6.000000 10.000000 

What settings do I need to change to use source in knitr docs correctly? 我需要更改哪些设置才能正确使用knitr docs中的source

When you use source() , knitr is unable to analyze the possible objects to be created from it; 当你使用source()knitr无法分析从中创建的可能对象; knitr must be able to see the full source code to analyze the dependencies among code chunks. knitr必须能够看到完整的源代码来分析代码块之间的依赖关系。 There are two approaches to solve your problem: 有两种方法可以解决您的问题:

  1. Tell the second chunk that it depends on the value of x by adding an arbitrary chunk option that uses the value of x , eg ```{r cache.extra = x} ; 通过添加使用x值的任意chunk选项告诉第二个块它依赖于x的值,例如```{r cache.extra = x} ; then whenever x changes, the cache of this code chunk will be automatically invalidated ( more info ); 然后每当x改变时,该代码块的缓存将自动失效( 更多信息 );
  2. Let knitr see the full source code; knitr看到完整的源代码; you can pass the source code to a code chunk via the chunk option code , eg ```{r code = readLines('data1.R')} (same for data2.R ); 你可以通过chunk选项code将源代码传递给代码块,例如```{r code = readLines('data1.R')} (对于data2.R ); then dep_auto() should be able to figure out x was created from the first chunk, and used in the second chunk, so the second chunk must depend on the first chunk. 然后dep_auto()应该能够弄清楚x是从第一个块创建的,并在第二个块中使用,所以第二个块必须依赖于第一个块。

I found that this works (knitr 1.17): 我发现这有效(knitr 1.17):

<<..., dependson=all_labels()>>=
...
@

I think, by default, chunks do depend on previous chunks, and the author went to great lengths to try to make each chunk start with the same environment that the last one ended (although there are numerous ways of screwing this up, like sourcing files with caching turned on...) I can't recall the syntax, but you can include knitr chunks in external documents. 我认为,默认情况下,块确实依赖于以前的块,并且作者竭尽全力尝试使每个块开始使用与最后一个块结束的相同环境(尽管有很多方法可以解决这个问题,例如采购文件打开缓存...)我不记得语法,但你可以在外部文件中包含knitr块。 There is also a trick to reuse knitr chunks in the same doc in a function-like manner by reusing the label, and you may be able to build some non linear dependency from this. 还有一个技巧可以通过重用标签以类似函数的方式在同一个doc中重用knitr块,并且您可以从中构建一些非线性依赖。 But why not set cache to FALSE when you don't want caching? 但是,当您不想缓存时,为什么不将缓存设置为FALSE? Sourcing seems like a bad idea but I can't put my finger on why. 采购似乎是一个坏主意,但我无法理解为什么。 I would make the knitr workflow linear and put logic in functions, and turn off caching if the same function call can return different things with the same input parameters. 我会使knitr工作流线性并将逻辑放入函数中,如果相同的函数调用可以返回具有相同输入参数的不同内容,则关闭缓存。

Another trick that might be useful to you is the recently added ability to knit a document using input parameters. 另一个可能对您有用的技巧是最近添加的使用输入参数编织文档的功能。 This could possibly extract some logic from your knitr doc, which I think is the avoidable root of your problems. 这可能会从你的knitr doc中提取一些逻辑,我认为这是你问题的可避免根源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM