简体   繁体   English

在 R4.0.0 中合并数据帧比 R3.6.3 慢 60 倍?

[英]Merging data frames in R4.0.0 60x slower than R3.6.3?

Merging 2 data frames in R4.0.0 is much slower than in R.3.6.3.在 R4.0.0 中合并 2 个数据帧比在 R.3.6.3 中慢得多。

Reproducible example :可重现的例子

library(reshape2)

n <- matrix(1:6000000, nrow=15000, ncol=400)
colnames(n) <- 1:ncol(n)

meta <- data.frame("col1" = ncol(n):1, row.names = colnames(n))
start_time <- Sys.time()
test = sapply(1:nrow(n), 
               function(i) {
                 print(i)
                 nn = reshape2::melt(n[i,])
                 tmp = merge(nn, meta, by="row.names");
               }
)
end_time <- Sys.time()
end_time-start_time

This code takes 23 seconds in R3.6.3 and 23 minutes in R4.0.0 (on my machine) - making the merge 60 times slower .此代码在 R3.6.3 中需要 23 秒,在 R4.0.0(在我的机器上)中需要 23 分钟 - 使合并速度慢 60 倍 The 'melt' function is not the time consuming step. “融化” function 不是耗时的步骤。

This issue is also not related to sapply.此问题也与 sapply 无关。 You can reproduce the extreme speed difference using a for loop:您可以使用 for 循环重现极端速度差异:

for(i in 1:nrow(n)){
  print(i)
  nn = reshape2::melt(n[i,])
  tmp = merge(nn, meta, by="row.names");
}

I'd be happy to hear your feedback?我很高兴听到您的反馈? Am I missing something?我错过了什么吗?

(I have previously asked this question in another context and have deleted the old question since it wasn't reproducible) (我之前曾在另一种情况下问过这个问题,并删除了旧问题,因为它不可重现)

This is actually a bug in R4.0.0.实际上是R4.0.0 中的一个错误。

I have submitted it to R-core ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17794 ) and update here once it is fixed.我已将其提交给 R-core ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17794 ) 并在修复后在此处更新。

Update The bug has been fixed by the R-team (thanks to Martin Maechler,), is already rolled out in R-devel.更新R 团队已修复该错误(感谢 Martin Maechler),已在 R-devel 中推出。 and the fix will be included in the next release version of R.并且该修复将包含在 R 的下一个发行版本中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM