简体   繁体   English

计算每列高于 R 阈值范围的值的数量

[英]Count the number of values per column above a range of thresholds in R

How to count the number of values per column above a sequence of thresholds?如何计算高于一系列阈值的每列值的数量?

ie: calculate for each column, the number of values above 100, then above 150, then above... and store the results in a data frame?即:为每一列计算大于100,然后大于150,然后大于...的值的数量......并将结果存储在数据框中?

# Reproductible data
# (Original data is daily streamflow values organized in columns per year)

set.seed(1234)
data = data.frame("1915" = runif(365, min = 60, max = 400),
                  "1916" = runif(365, min = 60, max = 400),
                  "1917" = runif(365, min = 60, max = 400))

# my code chunck

mymin = 75
mymax = 400
my step = 25

apply(data, 2, function (x) {
  for(i in seq(mymin,mymax,mystep)) {
    res = (sum(x > i)) # or nrow(data[x > i,])
    return(res)
  }
})

This code works well for one iteration, but I can't store the result of each iteration in a data frame.此代码适用于一次迭代,但我无法将每次迭代的结果存储在数据框中。

I also tried approaches such as:我还尝试了以下方法:

for (i in 1:n){
  seuil = seq(mymin, mymax, my step)
  lapply(data, function(x) {
    res [[i]] = nrow(data[ x > seuil[i], ])
    return(res)}
})

Which does not work really well...哪个不太好用...

The output would be something like: output 将类似于:

year n value above 75 n 值高于 75 n values above 100 n 值大于 100 n value above... n 值高于...
1915 1915年 348 348 329 329 ... ...
1916 1916年 351 351 325 325 ... ...
... ... ... ... ... ... ... ...

Thanks for your comments and suggestions:)感谢您的意见和建议:)

myseq <- seq(75, 400, by=25)
as.data.frame(do.call(rbind, lapply(data, function(z) table(findInterval(z, myseq)))))
#        0  1  2  3  4  5  6  7  8  9 10 11 12 13
# X1915 17 19 26 27 41 23 26 33 27 22 30 25 21 28
# X1916 14 26 20 28 25 26 22 23 35 28 26 30 22 40
# X1917 20 30 24 31 24 28 22 25 28 34 18 21 26 34

or if you like the factor levels that R will come up with using cut , then或者,如果您喜欢 R 使用cut提出的factor水平,那么

as.data.frame(do.call(rbind, lapply(data, function(z) table(cut(z, myseq)))))
#       (75,100] (100,125] (125,150] (150,175] (175,200] (200,225] (225,250] (250,275] (275,300] (300,325] (325,350] (350,375] (375,400]
# X1915       19        26        27        41        23        26        33        27        22        30        25        21        28
# X1916       26        20        28        25        26        22        23        35        28        26        30        22        40
# X1917       30        24        31        24        28        22        25        28        34        18        21        26        34

You can try:你可以试试:

vals <- seq(mymin,mymax,mystep)
mat <- sapply(vals, function(x) sapply(data, function(y) sum(y > x)))
colnames(mat) <- paste0('values_above_', vals)
mat

#      values_above_75 values_above_100 values_above_125 values_above_150 values_above_175
#X1915             348              329              303              276              235
#X1916             351              325              305              277              252
#X1917             345              315              291              260              236

#      values_above_200 values_above_225 values_above_250 values_above_275 values_above_300
#X1915              212              186              153              126              104
#X1916              226              204              181              146              118
#X1917              208              186              161              133               99

#      values_above_325 values_above_350 values_above_375 values_above_400
#X1915               74               49               28                0
#X1916               92               62               40                0
#X1917               81               60               34                0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM