简体   繁体   中英

R: grouping numbers into bins

I am looking to find the smallest number in a column in a data frame that is larger a number in another array.

Example
DistrDF

Bin Freq CumSum  
0.1 0.05 0.05  
0.2 0.07 0.12    
0.3 0.20 0.32  
0.4 0.10 0.42  
0.5 0.00 0.42   
0.6 0.15 0.57  
0.7 0.00 0.57  
0.8 0.30 0.87  
0.9 0.11 0.98  
1.0 0.02 1.0

Then I have an array of, say, 10 random numbers between 0 and 1 (ie each random number will fall into one of the bins in the DistrDF)

RandNums
0.13
0.50
0.11
0.10
0.70
0.05
0.12
0.80
0.88
0.40

I would like to use these two table to create a third table, which indicates into which bin each of the random numbers falls, as below:

ResultDF  
0.30 (because 0.13 < 0.32 and 0.13 > 0.12)
0.60 (because 0.50 < 0.57 and 0.50 > 0.42)
...
0.30 (because 0.40 < 0.42 and 0.40 > 0.32)

Does anyone have any ideas? I feel like an aggregate or something might be in order, but I'm not sure.

The cut function does what you want:

DistrDF <- DistrDF[DistrDF$Freq > 0,]  # Remove empty bins
DistrDF$Bin[cut(x$RandNums, c(0, DistrDF$CumSum))]
# [1] 0.3 0.6 0.2 0.2 0.8 0.1 0.2 0.8 0.9 0.4

You can manipulate the include.lowest and right parameters to change how you handle points that fall on the border of bins.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM