How can I split data.table by equal cumulative sum of N column? These data include codes and the N is the number of lines in a much larger set for each code (that I haven't reproduced here).
I'd like to be able to split the codes by aprox. 50,000 cumsum of N, producing data.tables of varying lengths of rows, but with unique codes that sum to aprox 50,000 total N.
In reality the N are random, not pattered, but this does a good job at replicating the data for a small sample size:
dt <- dt <- data.table(code=c(1:500),N=c(rep(c(100:500),100),rep(c(100:500),100),rep(c(100:500),100), rep(c(100:500),100), rep(c(100:500),100)))
dt$cumsum <- cumsum(dt$N)
desired1 <- dt[1:233,] ###first 50,000 cumsum of N
desired2 <- dt[234:359,]
desired3 <- dt[360:565,]
desired4 <- dt[566:713,] ###etc every 50,000 cumsum of N
We create a grouping variable with %/%
for splitting.
dt[, grp := shift(cumsum %/% 50000, fill = 0)]
and then do the split
lst <- split(dt, dt$grp)
tail(lst[[1]], 1)
# code N cumsum grp
#1: 233 332 50328 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.