简体   繁体   English

R:将数字分组到箱中

[英]R: grouping numbers into bins

I am looking to find the smallest number in a column in a data frame that is larger a number in another array. 我正在寻找在数据框中的一列中找到最小的数字,而在另一个数组中找到一个更大的数字。

Example
DistrDF DistrDF

Bin Freq CumSum  
0.1 0.05 0.05  
0.2 0.07 0.12    
0.3 0.20 0.32  
0.4 0.10 0.42  
0.5 0.00 0.42   
0.6 0.15 0.57  
0.7 0.00 0.57  
0.8 0.30 0.87  
0.9 0.11 0.98  
1.0 0.02 1.0

Then I have an array of, say, 10 random numbers between 0 and 1 (ie each random number will fall into one of the bins in the DistrDF) 然后我有一个数组,例如10个介于0和1之间的随机数(即,每个随机数将落入DistrDF中的bin之一)

RandNums
0.13
0.50
0.11
0.10
0.70
0.05
0.12
0.80
0.88
0.40

I would like to use these two table to create a third table, which indicates into which bin each of the random numbers falls, as below: 我想使用这两个表来创建第三个表,该表指示每个随机数落入哪个bin,如下所示:

ResultDF  
0.30 (because 0.13 < 0.32 and 0.13 > 0.12)
0.60 (because 0.50 < 0.57 and 0.50 > 0.42)
...
0.30 (because 0.40 < 0.42 and 0.40 > 0.32)

Does anyone have any ideas? 有人有什么想法吗? I feel like an aggregate or something might be in order, but I'm not sure. 我觉得可能有某种aggregate或某些问题,但不确定。

The cut function does what you want: cut函数可以满足您的需求:

DistrDF <- DistrDF[DistrDF$Freq > 0,]  # Remove empty bins
DistrDF$Bin[cut(x$RandNums, c(0, DistrDF$CumSum))]
# [1] 0.3 0.6 0.2 0.2 0.8 0.1 0.2 0.8 0.9 0.4

You can manipulate the include.lowest and right parameters to change how you handle points that fall on the border of bins. 您可以操纵include.lowestright参数来更改处理落在垃圾箱边界上的点的方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM