简体   繁体   中英

Split data.table by cumsum of column in R

How can I split data.table by equal cumulative sum of N column? These data include codes and the N is the number of lines in a much larger set for each code (that I haven't reproduced here).

I'd like to be able to split the codes by aprox. 50,000 cumsum of N, producing data.tables of varying lengths of rows, but with unique codes that sum to aprox 50,000 total N.

In reality the N are random, not pattered, but this does a good job at replicating the data for a small sample size:

dt <- dt <- data.table(code=c(1:500),N=c(rep(c(100:500),100),rep(c(100:500),100),rep(c(100:500),100), rep(c(100:500),100), rep(c(100:500),100)))
dt$cumsum <- cumsum(dt$N) 
desired1 <- dt[1:233,] ###first 50,000 cumsum of N
desired2 <- dt[234:359,]
desired3 <- dt[360:565,]
desired4 <- dt[566:713,] ###etc every 50,000 cumsum of N

We create a grouping variable with %/% for splitting.

dt[, grp := shift(cumsum %/% 50000, fill = 0)]

and then do the split

lst <- split(dt, dt$grp)
tail(lst[[1]], 1)
#   code   N cumsum grp
#1:  233 332  50328   0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM