简体   繁体   English

使用R将矩阵划分为N个相同大小的块

[英]Partition matrix into N equally-sized chunks with R

How can I partition a matrix or dataframe into N equally-sized chunks with R? 如何用R将矩阵或数据帧划分为N个大小相等的块? I want to cut the matrix or dataframe horizontally. 我想水平切割矩阵或数据框。

For example, given: 例如,给定:

r = 8
c = 10
number_of_chunks = 4
data = matrix(seq(r*c), nrow = r, ncol=c)
>>> data

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    9   17   25   33   41   49   57   65    73
[2,]    2   10   18   26   34   42   50   58   66    74
[3,]    3   11   19   27   35   43   51   59   67    75
[4,]    4   12   20   28   36   44   52   60   68    76
[5,]    5   13   21   29   37   45   53   61   69    77
[6,]    6   14   22   30   38   46   54   62   70    78
[7,]    7   15   23   31   39   47   55   63   71    79
[8,]    8   16   24   32   40   48   56   64   72    80

I would like to have to cut data into a list of 4 elements: 我想将data切割成4个元素的列表:

Element 1: 要素1:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    9   17   25   33   41   49   57   65    73
[2,]    2   10   18   26   34   42   50   58   66    74

Element 2: 要素2:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[3,]    3   11   19   27   35   43   51   59   67    75
[4,]    4   12   20   28   36   44   52   60   68    76

Element 3: 要素3:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[5,]    5   13   21   29   37   45   53   61   69    77
[6,]    6   14   22   30   38   46   54   62   70    78

Element 4: 要素4:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[7,]    7   15   23   31   39   47   55   63   71    79
[8,]    8   16   24   32   40   48   56   64   72    80

With numpy in python, I can use numpy.array_split . 在python中使用numpy,我可以使用numpy.array_split

Here's an attempt in base R. Calculate "pretty" cut values for the sequence of rows using pretty . 这是基础R的尝试。使用pretty计算行序列的“漂亮”剪切值。 Categorized the sequence of row numbers with cut and return a list of the the sequence split at the cut values with split . 使用cut对行号序列进行分类,并使用split返回的序列列表拆分为切割值。 Finally, run through a list of the split row values using lapply extract the matrix subsets with [ . 最后,使用lapply运行一个拆分行值的列表,用[提取矩阵子集。

lapply(split(seq_len(nrow(data)),
             cut(seq_len(nrow(data)), pretty(seq_len(nrow(data)), number_of_chunks))),
       function(x) data[x, ])
$`(0,2]`
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    9   17   25   33   41   49   57   65    73
[2,]    2   10   18   26   34   42   50   58   66    74

$`(2,4]`
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3   11   19   27   35   43   51   59   67    75
[2,]    4   12   20   28   36   44   52   60   68    76

$`(4,6]`
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    5   13   21   29   37   45   53   61   69    77
[2,]    6   14   22   30   38   46   54   62   70    78

$`(6,8]`
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    7   15   23   31   39   47   55   63   71    79
[2,]    8   16   24   32   40   48   56   64   72    80

Roll this into a function: 将其转换为函数:

array_split <- function(data, number_of_chunks) {
  rowIdx <- seq_len(nrow(data))    
  lapply(split(rowIdx, cut(rowIdx, pretty(rowIdx, number_of_chunks))), function(x) data[x, ])
}

Then, you can use 然后,你可以使用

array_split(data=data, number_of_chunks=number_of_chunks)

to return the same result as above. 返回与上面相同的结果。


A nice simplification suggested by @user20650 is @ user20650建议的一个很好的简化是

split.data.frame(data,
                 cut(seq_len(nrow(data)), pretty(seq_len(nrow(data)), number_of_chunks)))

A surprise to me, split.data.frame returns a list of matrices when its first argument is a matrix. 令我惊讶的是,当第一个参数是一个矩阵时, split.data.frame返回一个矩阵列表。

number_of_chunks = 4
lapply(seq(1, NROW(data), ceiling(NROW(data)/number_of_chunks)),
       function(i) data[i:min(i + ceiling(NROW(data)/number_of_chunks) - 1, NROW(data)),])

OR 要么

lapply(split(data, rep(1:number_of_chunks, each = NROW(data)/number_of_chunks)),
       function(a) matrix(a, ncol = NCOL(data)))

Try to not split the data explicitly, because it's another copy. 尽量不要明确地拆分数据,因为它是另一个副本。 You'd rather split the indices you want to access. 您宁愿拆分您想要访问的索引。

With this function, you can split by number of chunks (for parallelism) or by size of the chunks. 使用此功能,您可以按块数(并行度)或块大小进行拆分。

CutBySize <- function(m, block.size, nb = ceiling(m / block.size)) {
  int <- m / nb
  upper <- round(1:nb * int)
  lower <- c(1, upper[-nb] + 1)
  size <- c(upper[1], diff(upper))
  cbind(lower, upper, size)
}

CutBySize(nrow(data), nb = number_of_chunks)

     lower upper size
[1,]     1     2    2
[2,]     3     4    2
[3,]     5     6    2
[4,]     7     8    2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM