简体   繁体   English

按R中列的总和拆分data.table

[英]Split data.table by cumsum of column in R

How can I split data.table by equal cumulative sum of N column? 如何用相等的N列累加总和拆分data.table? These data include codes and the N is the number of lines in a much larger set for each code (that I haven't reproduced here). 这些数据包括代码,N是每个代码的更大集合中的行数(我在这里没有复制)。

I'd like to be able to split the codes by aprox. 我希望能够按aprox拆分代码。 50,000 cumsum of N, producing data.tables of varying lengths of rows, but with unique codes that sum to aprox 50,000 total N. 50,000 N的总和,产生行长度不同的data.tables,但唯一的代码总计约50,000 N.

In reality the N are random, not pattered, but this does a good job at replicating the data for a small sample size: 实际上,N是随机的,不是随机的,但这在复制较小样本量的数据方面做得很好:

dt <- dt <- data.table(code=c(1:500),N=c(rep(c(100:500),100),rep(c(100:500),100),rep(c(100:500),100), rep(c(100:500),100), rep(c(100:500),100)))
dt$cumsum <- cumsum(dt$N) 
desired1 <- dt[1:233,] ###first 50,000 cumsum of N
desired2 <- dt[234:359,]
desired3 <- dt[360:565,]
desired4 <- dt[566:713,] ###etc every 50,000 cumsum of N

We create a grouping variable with %/% for splitting. 我们使用%/%创建用于分组的分组变量。

dt[, grp := shift(cumsum %/% 50000, fill = 0)]

and then do the split 然后进行split

lst <- split(dt, dt$grp)
tail(lst[[1]], 1)
#   code   N cumsum grp
#1:  233 332  50328   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM