[英]Count the number of values per column above a range of thresholds in R
How to count the number of values per column above a sequence of thresholds?如何计算高于一系列阈值的每列值的数量?
ie: calculate for each column, the number of values above 100, then above 150, then above... and store the results in a data frame?即:为每一列计算大于100,然后大于150,然后大于...的值的数量......并将结果存储在数据框中?
# Reproductible data
# (Original data is daily streamflow values organized in columns per year)
set.seed(1234)
data = data.frame("1915" = runif(365, min = 60, max = 400),
"1916" = runif(365, min = 60, max = 400),
"1917" = runif(365, min = 60, max = 400))
# my code chunck
mymin = 75
mymax = 400
my step = 25
apply(data, 2, function (x) {
for(i in seq(mymin,mymax,mystep)) {
res = (sum(x > i)) # or nrow(data[x > i,])
return(res)
}
})
This code works well for one iteration, but I can't store the result of each iteration in a data frame.此代码适用于一次迭代,但我无法将每次迭代的结果存储在数据框中。
I also tried approaches such as:我还尝试了以下方法:
for (i in 1:n){
seuil = seq(mymin, mymax, my step)
lapply(data, function(x) {
res [[i]] = nrow(data[ x > seuil[i], ])
return(res)}
})
Which does not work really well...哪个不太好用...
The output would be something like: output 将类似于:
year![]() |
n value above 75 ![]() |
n values above 100 ![]() |
n value above... ![]() |
---|---|---|---|
1915 ![]() |
348 ![]() |
329 ![]() |
... ![]() |
1916 ![]() |
351 ![]() |
325 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
Thanks for your comments and suggestions:)感谢您的意见和建议:)
myseq <- seq(75, 400, by=25)
as.data.frame(do.call(rbind, lapply(data, function(z) table(findInterval(z, myseq)))))
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
# X1915 17 19 26 27 41 23 26 33 27 22 30 25 21 28
# X1916 14 26 20 28 25 26 22 23 35 28 26 30 22 40
# X1917 20 30 24 31 24 28 22 25 28 34 18 21 26 34
or if you like the factor
levels that R will come up with using cut
, then或者,如果您喜欢 R 使用
cut
提出的factor
水平,那么
as.data.frame(do.call(rbind, lapply(data, function(z) table(cut(z, myseq)))))
# (75,100] (100,125] (125,150] (150,175] (175,200] (200,225] (225,250] (250,275] (275,300] (300,325] (325,350] (350,375] (375,400]
# X1915 19 26 27 41 23 26 33 27 22 30 25 21 28
# X1916 26 20 28 25 26 22 23 35 28 26 30 22 40
# X1917 30 24 31 24 28 22 25 28 34 18 21 26 34
You can try:你可以试试:
vals <- seq(mymin,mymax,mystep)
mat <- sapply(vals, function(x) sapply(data, function(y) sum(y > x)))
colnames(mat) <- paste0('values_above_', vals)
mat
# values_above_75 values_above_100 values_above_125 values_above_150 values_above_175
#X1915 348 329 303 276 235
#X1916 351 325 305 277 252
#X1917 345 315 291 260 236
# values_above_200 values_above_225 values_above_250 values_above_275 values_above_300
#X1915 212 186 153 126 104
#X1916 226 204 181 146 118
#X1917 208 186 161 133 99
# values_above_325 values_above_350 values_above_375 values_above_400
#X1915 74 49 28 0
#X1916 92 62 40 0
#X1917 81 60 34 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.