I am looking to find the smallest number in a column in a data frame that is larger a number in another array.
Example
DistrDF
Bin Freq CumSum
0.1 0.05 0.05
0.2 0.07 0.12
0.3 0.20 0.32
0.4 0.10 0.42
0.5 0.00 0.42
0.6 0.15 0.57
0.7 0.00 0.57
0.8 0.30 0.87
0.9 0.11 0.98
1.0 0.02 1.0
Then I have an array of, say, 10 random numbers between 0 and 1 (ie each random number will fall into one of the bins in the DistrDF)
RandNums
0.13
0.50
0.11
0.10
0.70
0.05
0.12
0.80
0.88
0.40
I would like to use these two table to create a third table, which indicates into which bin each of the random numbers falls, as below:
ResultDF
0.30 (because 0.13 < 0.32 and 0.13 > 0.12)
0.60 (because 0.50 < 0.57 and 0.50 > 0.42)
...
0.30 (because 0.40 < 0.42 and 0.40 > 0.32)
Does anyone have any ideas? I feel like an aggregate
or something might be in order, but I'm not sure.
The cut
function does what you want:
DistrDF <- DistrDF[DistrDF$Freq > 0,] # Remove empty bins
DistrDF$Bin[cut(x$RandNums, c(0, DistrDF$CumSum))]
# [1] 0.3 0.6 0.2 0.2 0.8 0.1 0.2 0.8 0.9 0.4
You can manipulate the include.lowest
and right
parameters to change how you handle points that fall on the border of bins.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.