[英]Merging data frames in R4.0.0 60x slower than R3.6.3?
Merging 2 data frames in R4.0.0 is much slower than in R.3.6.3.在 R4.0.0 中合并 2 个数据帧比在 R.3.6.3 中慢得多。
Reproducible example :可重现的例子:
library(reshape2)
n <- matrix(1:6000000, nrow=15000, ncol=400)
colnames(n) <- 1:ncol(n)
meta <- data.frame("col1" = ncol(n):1, row.names = colnames(n))
start_time <- Sys.time()
test = sapply(1:nrow(n),
function(i) {
print(i)
nn = reshape2::melt(n[i,])
tmp = merge(nn, meta, by="row.names");
}
)
end_time <- Sys.time()
end_time-start_time
This code takes 23 seconds in R3.6.3 and 23 minutes in R4.0.0 (on my machine) - making the merge 60 times slower .此代码在 R3.6.3 中需要 23 秒,在 R4.0.0(在我的机器上)中需要 23 分钟 - 使合并速度慢 60 倍。 The 'melt' function is not the time consuming step.
“融化” function 不是耗时的步骤。
This issue is also not related to sapply.此问题也与 sapply 无关。 You can reproduce the extreme speed difference using a for loop:
您可以使用 for 循环重现极端速度差异:
for(i in 1:nrow(n)){
print(i)
nn = reshape2::melt(n[i,])
tmp = merge(nn, meta, by="row.names");
}
I'd be happy to hear your feedback?我很高兴听到您的反馈? Am I missing something?
我错过了什么吗?
(I have previously asked this question in another context and have deleted the old question since it wasn't reproducible) (我之前曾在另一种情况下问过这个问题,并删除了旧问题,因为它不可重现)
This is actually a bug in R4.0.0.这实际上是R4.0.0 中的一个错误。
I have submitted it to R-core ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17794 ) and update here once it is fixed.我已将其提交给 R-core ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17794 ) 并在修复后在此处更新。
Update The bug has been fixed by the R-team (thanks to Martin Maechler,), is already rolled out in R-devel.更新R 团队已修复该错误(感谢 Martin Maechler),已在 R-devel 中推出。 and the fix will be included in the next release version of R.
并且该修复将包含在 R 的下一个发行版本中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.