按R中列的总和拆分data.table

Question

How can I split data.table by equal cumulative sum of N column? 如何用相等的N列累加总和拆分data.table？ These data include codes and the N is the number of lines in a much larger set for each code (that I haven't reproduced here). 这些数据包括代码，N是每个代码的更大集合中的行数（我在这里没有复制）。

I'd like to be able to split the codes by aprox. 我希望能够按aprox拆分代码。 50,000 cumsum of N, producing data.tables of varying lengths of rows, but with unique codes that sum to aprox 50,000 total N. 50,000 N的总和，产生行长度不同的data.tables，但唯一的代码总计约50,000 N.

In reality the N are random, not pattered, but this does a good job at replicating the data for a small sample size: 实际上，N是随机的，不是随机的，但这在复制较小样本量的数据方面做得很好：

dt <- dt <- data.table(code=c(1:500),N=c(rep(c(100:500),100),rep(c(100:500),100),rep(c(100:500),100), rep(c(100:500),100), rep(c(100:500),100)))
dt$cumsum <- cumsum(dt$N) 
desired1 <- dt[1:233,] ###first 50,000 cumsum of N
desired2 <- dt[234:359,]
desired3 <- dt[360:565,]
desired4 <- dt[566:713,] ###etc every 50,000 cumsum of N

Answer 1

We create a grouping variable with %/% for splitting. 我们使用%/%创建用于分组的分组变量。

dt[, grp := shift(cumsum %/% 50000, fill = 0)]

and then do the split 然后进行split

lst <- split(dt, dt$grp)
tail(lst[[1]], 1)
#   code   N cumsum grp
#1:  233 332  50328   0

按R中列的总和拆分data.table

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-16 04:58:31

按R中列的总和拆分data.table

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-16 04:58:31

解决方案1
2 已采纳 2018-07-16 04:58:31