[英]Best way to replace a nested for loop in R
for
loops in R are generally considered slow: it's hard to avoid unintended memory read/writes. R中的
for
循环通常被认为是缓慢的:很难避免意外的内存读/写。 But how to replace a nested for loop? 但是,如何替换嵌套的for循环? Which is the best approach?
哪种方法最好?
Please note that this is a generic question: the f
function below is just an example, it could be much more complicated or return different objects. 请注意,这是一个通用问题:下面的
f
函数只是一个示例,它可能更加复杂或返回不同的对象。 I just want to see all the different approaches that one can take in R to avoid nested for loops. 我只想看看人们可以采用R来避免嵌套for循环的所有不同方法。
Consider this as an example: 考虑以下示例:
al <- c(2,3,4)
bl <- c("foo", "bar")
f <- function(n, c) { #Just one simple example function, could be much more complicated
data.frame(n=n, c=c, val=n*nchar(c))
}
d <- data.frame()
for (a in al) {
for (b in bl) {
d <- rbind(d, f(a, b))
#one could undoubtedly do this a lot better
#even keeping to nested for loops
}
}
One could replace it in this absolutely horrible way (take this only as a crude example): 一个人可以用这种绝对可怕的方式替换它(仅作为一个粗略的例子):
eg <- expand.grid(al, bl)
d <- do.call(rbind,
lapply(1:dim(eg)[1],
function(i) {f(as.numeric(eg[i,1]), as.character(eg[i, 2]))}
)
)
or using library(purrr)
, which is a little bit less inelegant: 或使用
library(purrr)
,它的library(purrr)
:
d <- map_dfr(bl, function(b) map2_dfr(al, b, f))
... there are countless different methods. ...有无数种不同的方法。 Which one is the simplest, and which one the fastest?
哪一个最简单,哪一个最快?
Here is a very quick evaluation of the performance of the three previous methods on my laptop: 这是笔记本电脑上以前三种方法的性能的快速评估:
Simply vectorize with expand.grid
and nchar
. 只需使用
expand.grid
和nchar
向量化。 No for
or apply
loops needed: 不需要
for
或apply
循环:
eg <- expand.grid(c=bl, n=al, stringsAsFactors = FALSE)
eg$val <- eg$n * nchar(eg$c)
# RE-ORDER COLUMNS
eg <- eg[c("n", "c", "val")]
Or one-line with transform
: 或单行
transform
:
eg <- transform(expand.grid(c=bl, n=al, stringsAsFactors = FALSE),
val=n * nchar(c))[c("n", "c", "val")]
And if you set stringsAsFactors = FALSE
in f function: 并且如果在f函数中设置
stringsAsFactors = FALSE
:
f <- function(n, c) {
data.frame(n=n, c=c, val=n*nchar(c), stringsAsFactors = FALSE)
}
Output is equivalent to for
loop dataframe: 输出等效于
for
循环数据帧:
all.equal(d, eg)
# [1] TRUE
n=rep(al,length(bl));e=rep(bl,length(al))
> cbind.data.frame(n,c=e,val=mapply(function(x,y)x*nchar(y),n,e))
n c val
1 2 foo 6
2 3 bar 9
3 4 foo 12
4 2 bar 6
5 3 foo 9
6 4 bar 12
or: 要么:
n=rep(al,length(bl));e=rep(bl,length(al))
cbind.data.frame(n,c=e,val=c(outer(al,bl,function(x,y)x*nchar(y))))
n c val
1 2 foo 6
2 3 bar 9
3 4 foo 12
4 2 bar 6
5 3 foo 9
6 4 bar 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.