[英]Count number of instances above a varying threshold
I have the 0.95 percentile threshold for temperature for each country.我有每个国家/地区温度的 0.95 个百分位阈值。 In the example below a week is 4 days.
在下面的示例中,一周是 4 天。 I want to count in a new vector/single-column-dataframe how many days each individual country's temperature is over that country's threshold on a weekly basis.
我想在一个新的向量/单列数据框中计算每个国家的温度每周有多少天超过该国家的阈值。
The country 95% percentile temperatures are:该国 95% 的百分位气温是:
q95 <- c(26,21,22,20,23)
DailyTempCountry <- data.frame(Date = c("W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4"),
Country = c("AL","AL", "AL", "AL","AL","AL", "AL", "AL",
"BE","BE", "BE", "BE", "BE","BE", "BE", "BE",
"CA","CA", "CA", "CA","CA","CA", "CA", "CA",
"DE","DE", "DE", "DE","DE","DE", "DE", "DE",
"UK","UK", "UK", "UK","UK","UK", "UK", "UK"),
DailyTemp = c(27,25,20,22,20,20,27,27,
24,22,23,18,17,19,20,16,
23,23,23,23,27,26,20,26,
19,18,17,19,16,15,19,18,
20,24,24,20,19,25,19,25))
DailyTempCountry
Date Country DailyTemp
1 W1D1 AL 27
2 W1D2 AL 25
3 W1D3 AL 20
4 W1D4 AL 22
5 W2D1 AL 20
6 W2D2 AL 20
7 W2D3 AL 27
8 W2D4 AL 27
9 W1D1 BE 24
10 W1D2 BE 22
11 W1D3 BE 23
12 W1D4 BE 18
13 W2D1 BE 17
14 W2D2 BE 19
15 W2D3 BE 20
16 W2D4 BE 16
17 W1D1 CA 23
18 W1D2 CA 23
19 W1D3 CA 23
20 W1D4 CA 23
21 W2D1 CA 27
22 W2D2 CA 26
23 W2D3 CA 20
24 W2D4 CA 26
25 W1D1 DE 19
26 W1D2 DE 18
27 W1D3 DE 17
28 W1D4 DE 19
29 W2D1 DE 16
30 W2D2 DE 15
31 W2D3 DE 19
32 W2D4 DE 18
33 W1D1 UK 20
34 W1D2 UK 24
35 W1D3 UK 24
36 W1D4 UK 20
37 W2D1 UK 19
38 W2D2 UK 25
39 W2D3 UK 19
40 W2D4 UK 25
What I want is a vector/column that counts the number of days in that week above the country's threshold like this:我想要的是一个向量/列,它计算该周中高于国家阈值的天数,如下所示:
DaysInWeekAboveQ95 <- c(1,2,3,0,4,3,0,0,2,2)
df_right <- data.frame(Week = c("W1","W2","W1","W2","W1","W2","W1","W2","W1","W2"),
DaysInWeekAboveQ95 = c(1,2,3,0,4,3,0,0,2,2))
Week DaysInWeekAboveQ95
1 W1 1
2 W2 2
3 W1 3
4 W2 0
5 W1 4
6 W2 3
7 W1 0
8 W2 0
9 W1 2
10 W2 2
The q95% vector was q95% 载体是
q95 <- c(26,21,22,20,23)
so in the first week AL have 1 instance above its threshold value 26
.所以在第一周 AL 有 1 个实例高于其阈值
26
。 UK have 2 instances above 23 (UK's threshold) in the second week.英国在第二周有 2 个实例高于 23(英国的阈值)。 And so for every country and every week.
对于每个国家和每周都是如此。
I handled a similar problem but where the threshold did not vary by country but was just a constant 30 degrees (where I divide by 7 because seven days in week)我处理了一个类似的问题,但是阈值没有因国家/地区而异,而只是恒定的 30 度(我除以 7,因为一周中有 7 天)
DaysAbove30perWeek <- as.data.frame(tapply(testdlong$value > 30,
ceiling(seq(nrow(testdlong))/7),sum))
Maybe a solution is to loop over countries?也许解决方案是遍历国家/地区? However, I can't figure out how to incorporate the specific loop.
但是,我无法弄清楚如何合并特定的循环。 Other solutions are welcome.
欢迎其他解决方案。
In revised scenario you also need calculating a new column for week too在修改后的场景中,您还需要计算一周的新列
q95 <- c(26,21,22,20,23)
c_q95 <- data.frame(Country = unique(DailyTempCountry$Country),
threshold = q95)
library(dplyr)
DailyTempCountry %>% left_join(c_q95, by = 'Country') %>%
group_by(Country, Week = substr(Date, 1, 2)) %>%
summarise(days = sum(DailyTemp > threshold), .groups = 'drop')
# A tibble: 10 x 3
Country Week days
<chr> <chr> <int>
1 AL W1 1
2 AL W2 2
3 BE W1 3
4 BE W2 0
5 CA W1 4
6 CA W2 3
7 DE W1 0
8 DE W2 0
9 UK W1 2
10 UK W2 2
Created on 2021-05-05 by the reprex package (v2.0.0)由代表 package (v2.0.0) 于 2021 年 5 月 5 日创建
OP has asked that date variable is in some different format than given in sample data OP 已要求日期变量的格式与示例数据中给出的格式不同
time <- as.character(20000101:20000130)
> time
[1] "20000101" "20000102" "20000103" "20000104" "20000105" "20000106" "20000107" "20000108" "20000109" "20000110"
[11] "20000111" "20000112" "20000113" "20000114" "20000115" "20000116" "20000117" "20000118" "20000119" "20000120"
[21] "20000121" "20000122" "20000123" "20000124" "20000125" "20000126" "20000127" "20000128" "20000129" "20000130"
library(lubridate)
time <- ymd(time)
# Either ISO week
isoweek(time)
# or week
week(time)
> isoweek(time)
[1] 52 52 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4
> # or week
> week(time)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
library(lubridate) time <- ymd(time)图书馆(润滑)时间 <- ymd(时间)
isoweek(time) week(time) isoweek(时间) 周(时间)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.