简体   繁体   English

在R中快速将大矢量分割成块

[英]Quickly split a large vector into chunks in R

My question is extremely closely related to this one: 我的问题与这个问题密切相关:

Split a vector into chunks in R 将矢量拆分为R中的块

I'm trying to split a large vector into known chunk sizes and it's slow. 我正在尝试将大型矢量拆分为已知的块大小而且速度很慢。 A solution for vectors with even remainders is here: 这里有一个偶数余数向量的解决方案:

A quick solution when a factor exists is here: 存在因素时的快速解决方案如下:

Split dataframe into equal parts based on length of the dataframe 根据数据帧的长度将数据帧拆分为相等的部分

I would like to handle the case of no (large) factor existing, as I would like fairly large chunks. 我想处理没有(大)因素存在的情况,因为我想要相当大的块。

My example for a vector much smaller than the one in my real life application: 我的例子比我现实生活中的那个小得多:

d <- 1:6510321
# Sloooow
chunks <- split(d, ceiling(seq_along(d)/2000))

Using llply from the plyr package I was able to reduce the time. 使用llply包中的plyr我可以减少时间。

chunks <- function(d, n){      
    chunks <- split(d, ceiling(seq_along(d)/n))
    names(chunks) <- NULL
    return(chunks)
 }

require(plyr)
plyrChunks <- function(d, n){
     is <- seq(from = 1, to = length(d), by = ceiling(n))
     if(tail(is, 1) != length(d)) {
          is <- c(is, length(d)) 
     } 
     chunks <- llply(head(seq_along(is), -1), 
                     function(i){
                         start <-  is[i];
                         end <- is[i+1]-1;
                         d[start:end]})
    lc <- length(chunks)
    td <- tail(d, 1)
    chunks[[lc]] <- c(chunks[[lc]], td)
    return(chunks)
 }

 # testing
 d <- 1:6510321
 n <- 2000

 system.time(chks <- chunks(d,n))
 #    user  system elapsed 
 #   5.472   0.000   5.472 

 system.time(plyrChks <- plyrChunks(d, n))
 #    user  system elapsed 
 #   0.068   0.000   0.065 

 identical(chks, plyrChks)
 # TRUE

You can speed even more using the .parallel parameter from the llpyr function. 使用llpyr函数中的.parallel参数可以加快速度。 Or you can add a progress bar using the .progress parameter. 或者您可以使用.progress参数添加进度条。

并行包的速度提升:

chunks <- parallel::splitIndices(6510321, ncl = ceiling(6510321/2000))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM