简体   繁体   English

计算高于变化阈值的实例数

[英]Count number of instances above a varying threshold

I have the 0.95 percentile threshold for temperature for each country.我有每个国家/地区温度的 0.95 个百分位阈值。 In the example below a week is 4 days.在下面的示例中,一周是 4 天。 I want to count in a new vector/single-column-dataframe how many days each individual country's temperature is over that country's threshold on a weekly basis.我想在一个新的向量/单列数据框中计算每个国家的温度每周有多少天超过该国家的阈值。

The country 95% percentile temperatures are:该国 95% 的百分位气温是:

 q95 <- c(26,21,22,20,23)


  DailyTempCountry <- data.frame(Date = c("W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                         "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4"),
                              Country = c("AL","AL", "AL", "AL","AL","AL", "AL", "AL",
                                    "BE","BE", "BE", "BE", "BE","BE", "BE", "BE",
                                    "CA","CA", "CA", "CA","CA","CA", "CA", "CA",
                                    "DE","DE", "DE", "DE","DE","DE", "DE", "DE",
                                    "UK","UK", "UK", "UK","UK","UK", "UK", "UK"),
                              DailyTemp = c(27,25,20,22,20,20,27,27,
                                            24,22,23,18,17,19,20,16,
                                             23,23,23,23,27,26,20,26,
                                            19,18,17,19,16,15,19,18,
                                             20,24,24,20,19,25,19,25))
 DailyTempCountry



Date Country DailyTemp
1  W1D1      AL        27
2  W1D2      AL        25
3  W1D3      AL        20
4  W1D4      AL        22
5  W2D1      AL        20
6  W2D2      AL        20
7  W2D3      AL        27
8  W2D4      AL        27
9  W1D1      BE        24
10 W1D2      BE        22
11 W1D3      BE        23
12 W1D4      BE        18
13 W2D1      BE        17
14 W2D2      BE        19
15 W2D3      BE        20
16 W2D4      BE        16
17 W1D1      CA        23
18 W1D2      CA        23
19 W1D3      CA        23
20 W1D4      CA        23
21 W2D1      CA        27
22 W2D2      CA        26
23 W2D3      CA        20
24 W2D4      CA        26
25 W1D1      DE        19
26 W1D2      DE        18
27 W1D3      DE        17
28 W1D4      DE        19
29 W2D1      DE        16
30 W2D2      DE        15
31 W2D3      DE        19
32 W2D4      DE        18
33 W1D1      UK        20
34 W1D2      UK        24
35 W1D3      UK        24
36 W1D4      UK        20
37 W2D1      UK        19
38 W2D2      UK        25
39 W2D3      UK        19
40 W2D4      UK        25

What I want is a vector/column that counts the number of days in that week above the country's threshold like this:我想要的是一个向量/列,它计算该周中高于国家阈值的天数,如下所示:

  DaysInWeekAboveQ95 <- c(1,2,3,0,4,3,0,0,2,2)
df_right <- data.frame(Week = c("W1","W2","W1","W2","W1","W2","W1","W2","W1","W2"),
            DaysInWeekAboveQ95 = c(1,2,3,0,4,3,0,0,2,2))

 Week DaysInWeekAboveQ95
1    W1                  1
2    W2                  2
3    W1                  3
4    W2                  0
5    W1                  4
6    W2                  3
7    W1                  0
8    W2                  0
9    W1                  2
10   W2                  2

The q95% vector was q95% 载体是

q95 <- c(26,21,22,20,23)

so in the first week AL have 1 instance above its threshold value 26 .所以在第一周 AL 有 1 个实例高于其阈值26 UK have 2 instances above 23 (UK's threshold) in the second week.英国在第二周有 2 个实例高于 23(英国的阈值)。 And so for every country and every week.对于每个国家和每周都是如此。

I handled a similar problem but where the threshold did not vary by country but was just a constant 30 degrees (where I divide by 7 because seven days in week)我处理了一个类似的问题,但是阈值没有因国家/地区而异,而只是恒定的 30 度(我除以 7,因为一周中有 7 天)

DaysAbove30perWeek <- as.data.frame(tapply(testdlong$value > 30,
                                               ceiling(seq(nrow(testdlong))/7),sum))

Maybe a solution is to loop over countries?也许解决方案是遍历国家/地区? However, I can't figure out how to incorporate the specific loop.但是,我无法弄清楚如何合并特定的循环。 Other solutions are welcome.欢迎其他解决方案。

In revised scenario you also need calculating a new column for week too在修改后的场景中,您还需要计算一周的新列


q95 <- c(26,21,22,20,23)

c_q95 <- data.frame(Country = unique(DailyTempCountry$Country),
                    threshold = q95)

library(dplyr)

DailyTempCountry %>% left_join(c_q95, by = 'Country') %>%
  group_by(Country, Week = substr(Date, 1, 2)) %>%
  summarise(days = sum(DailyTemp > threshold), .groups = 'drop')

# A tibble: 10 x 3
   Country Week   days
   <chr>   <chr> <int>
 1 AL      W1        1
 2 AL      W2        2
 3 BE      W1        3
 4 BE      W2        0
 5 CA      W1        4
 6 CA      W2        3
 7 DE      W1        0
 8 DE      W2        0
 9 UK      W1        2
10 UK      W2        2

Created on 2021-05-05 by the reprex package (v2.0.0)代表 package (v2.0.0) 于 2021 年 5 月 5 日创建

OP has asked that date variable is in some different format than given in sample data OP 已要求日期变量的格式与示例数据中给出的格式不同

time <- as.character(20000101:20000130)
> time
 [1] "20000101" "20000102" "20000103" "20000104" "20000105" "20000106" "20000107" "20000108" "20000109" "20000110"
[11] "20000111" "20000112" "20000113" "20000114" "20000115" "20000116" "20000117" "20000118" "20000119" "20000120"
[21] "20000121" "20000122" "20000123" "20000124" "20000125" "20000126" "20000127" "20000128" "20000129" "20000130"

library(lubridate)
time <- ymd(time)

# Either ISO week
isoweek(time)
# or week
week(time)

> isoweek(time)
 [1] 52 52  1  1  1  1  1  1  1  2  2  2  2  2  2  2  3  3  3  3  3  3  3  4  4  4  4  4  4  4
> # or week
> week(time)
 [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5

library(lubridate) time <- ymd(time)图书馆(润滑)时间 <- ymd(时间)

isoweek(time) week(time) isoweek(时间) 周(时间)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算一个月和一年中时间序列数据超过阈值的次数 - Count number of times in a month and year that time series data is above a threshold 如何计算 R 中给定范围内的值以上的实例数? - How to count number of instances above a value within a given range in R? 仅当计数高于阈值时才绘制直方图箱 - Plot histogram bins only if count is above a threshold 如何根据每年每个月的每日降雨量数据计算/计算极端降水事件的数量(“阈值”以上) - How to calculate/count the number of extreme precipitation events (above a “threshold”) from daily rainfall data in each month per year basis R散点图中高于/低于阈值的计数点 - Count Points in R scatter plot above/below threshold R函数在总和高于阈值之前查找元素计数 - R function to find count of elements before sum is above a threshold 如何获取超过阈值+列索引约束的值的行数? - How to get the count of rows with values above a threshold + column index constrain? 计算每年低于阈值的月份数 - Count number of month per year under a threshold 计算高于阈值的数据帧中的行数作为函数或其他列因子 - calculate number of rows in a dataframe above threshold as a function or other column factors 有没有办法计算 R 中多个因变量的阈值以上的峰值数量? - Is there a way to calculate the number of peaks above a threshold for multiple dependent variables in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM