[英]Quickly split a large vector into chunks in R
My question is extremely closely related to this one: 我的问题与这个问题密切相关:
Split a vector into chunks in R 将矢量拆分为R中的块
I'm trying to split a large vector into known chunk sizes and it's slow. 我正在尝试将大型矢量拆分为已知的块大小而且速度很慢。 A solution for vectors with even remainders is here: 这里有一个偶数余数向量的解决方案:
A quick solution when a factor exists is here: 存在因素时的快速解决方案如下:
Split dataframe into equal parts based on length of the dataframe 根据数据帧的长度将数据帧拆分为相等的部分
I would like to handle the case of no (large) factor existing, as I would like fairly large chunks. 我想处理没有(大)因素存在的情况,因为我想要相当大的块。
My example for a vector much smaller than the one in my real life application: 我的例子比我现实生活中的那个小得多:
d <- 1:6510321
# Sloooow
chunks <- split(d, ceiling(seq_along(d)/2000))
Using llply
from the plyr
package I was able to reduce the time. 使用llply
包中的plyr
我可以减少时间。
chunks <- function(d, n){
chunks <- split(d, ceiling(seq_along(d)/n))
names(chunks) <- NULL
return(chunks)
}
require(plyr)
plyrChunks <- function(d, n){
is <- seq(from = 1, to = length(d), by = ceiling(n))
if(tail(is, 1) != length(d)) {
is <- c(is, length(d))
}
chunks <- llply(head(seq_along(is), -1),
function(i){
start <- is[i];
end <- is[i+1]-1;
d[start:end]})
lc <- length(chunks)
td <- tail(d, 1)
chunks[[lc]] <- c(chunks[[lc]], td)
return(chunks)
}
# testing
d <- 1:6510321
n <- 2000
system.time(chks <- chunks(d,n))
# user system elapsed
# 5.472 0.000 5.472
system.time(plyrChks <- plyrChunks(d, n))
# user system elapsed
# 0.068 0.000 0.065
identical(chks, plyrChks)
# TRUE
You can speed even more using the .parallel
parameter from the llpyr
function. 使用llpyr
函数中的.parallel
参数可以加快速度。 Or you can add a progress bar using the .progress
parameter. 或者您可以使用.progress
参数添加进度条。
并行包的速度提升:
chunks <- parallel::splitIndices(6510321, ncl = ceiling(6510321/2000))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.