Split data.table by cumsum of column in R

Question

How can I split data.table by equal cumulative sum of N column? These data include codes and the N is the number of lines in a much larger set for each code (that I haven't reproduced here).

I'd like to be able to split the codes by aprox. 50,000 cumsum of N, producing data.tables of varying lengths of rows, but with unique codes that sum to aprox 50,000 total N.

In reality the N are random, not pattered, but this does a good job at replicating the data for a small sample size:

dt <- dt <- data.table(code=c(1:500),N=c(rep(c(100:500),100),rep(c(100:500),100),rep(c(100:500),100), rep(c(100:500),100), rep(c(100:500),100)))
dt$cumsum <- cumsum(dt$N) 
desired1 <- dt[1:233,] ###first 50,000 cumsum of N
desired2 <- dt[234:359,]
desired3 <- dt[360:565,]
desired4 <- dt[566:713,] ###etc every 50,000 cumsum of N

Answer 1

We create a grouping variable with %/% for splitting.

dt[, grp := shift(cumsum %/% 50000, fill = 0)]

and then do the split

lst <- split(dt, dt$grp)
tail(lst[[1]], 1)
#   code   N cumsum grp
#1:  233 332  50328   0

Split data.table by cumsum of column in R

Question

1 answers

solution1
2 ACCPTED 2018-07-16 04:58:31

Split data.table by cumsum of column in R

Question

1 answers

solution1 2 ACCPTED 2018-07-16 04:58:31

solution1
2 ACCPTED 2018-07-16 04:58:31