简体   繁体   中英

How to bin numbers in R?

I have a data frame like this:
Col-1: id.
Col-2: ranges from 0 to 100.
Col-3: value.

id col-2        value
...
id 10.00          2 
id 10.53          2 
id 11.11         88 
id 11.76          6 
id 12.00          2 
id 12.12          2 
id 12.35        163 
id 12.50          6 
id 12.90          2 
id 13.33          5 
id 13.58        366 
id 13.64          8 
id 14.29         10 
id 14.81        725 
...
id 100  45

I want to make 100 bins of Col-2, and sum up values in Col-3 in that interval. How can I do that? For example output would be something like this:

id  0-1    sum-value-in-interval
id  1-2    sum-value-in-interval
id  2-3    sum-value-in-interval
...
id  10-11  4
id  11-12  94
...
id  99-100 sum-value-in-interval

Thanks for the help!

This is a dplyr based solution. Let your data be called dat :

library(dplyr)

dat%>%mutate(quantile = ntile(col2,100))%>%group_by(quantile)%>%summarize(sumValueInInterval = sum(col3))

We can use cut to create a grouping variable, use that in aggregate to get the sum of 'col2'.

df1$group <- as.character(cut(df1$col2, breaks=1:100))
aggregate(col3~group+id, df1, FUN=sum)

Or this can be done with data.table

library(data.table)
setDT(df1)[, group:= cut(col2, breaks=1:100)
                ][,list(col3= sum(col3)) ,.(group, id)]

data

set.seed(24)
df1 <- data.frame(id= paste0('id', rep(1:2, each=50)), 
  col2=rnorm(100, sample(100)), col3= sample(500, 100, replace=TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM