简体   繁体   中英

R quantile equal sized bucket after summarizing to the buckets

How do I make the quantity such that each group will be equal size after summing?

Below is an example that divides quantity into 10 groups with the same number of items per group.

    set.seed(42)
    quantity <- c(runif(100, 0, 100))
    dat <- data.frame(
      qty = quantity,
      qtile = cut(quantity, quantile(quantity, seq(0, 1, 0.1)), 
      include.lowest = TRUE))
    dat <- dat %>% group_by(qtile) %>% summarise(qty = sum(qty))
    ggplot(dat, aes(qtile, qty)) + geom_bar(stat = 'identity')

But how do I sort the groups such that at the summarise step the qty variable will be roughly equal by group?

So in this example, the total qty is 5244.787 , each group will have 524.4787 after summarise .

This is as far as I could get to. I feel it works approximately for my use case. If anyone else has good ideas for improvement feel free to update the answer.

set.seed(42)
quantity <- c(runif(100, 0, 100))

dat <- data.table(
  qty = quantity,
  wt = quantity
)
dat[!is.na(qty), avg := sum(wt) / 10]
setorder(dat, qty, wt)
dat[!is.na(qty), cum_wt := cumsum(wt)]
dat[!is.na(qty), level := cum_wt / avg]
dat[!is.na(qty), qtile := ceiling(level)]

dat <- dat[, .(qty = sum(qty)), by = 'qtile']

ggplot(dat, aes(qtile, qty)) + geom_bar(stat = 'identity')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM