[英]Running percentile value for each calendar day from multi-year data in R
I need to calculate the 30-day running (window) 90th percentile maximum temperature value for each calendar day from multi-year data.我需要根据多年数据计算每个日历日的 30 天运行(窗口)第 90 个百分位最高温度值。 For example, to calculate the 90th percentile value on Jan 1st, I have to choose a 30-day window centered on Jan 1st, ie, data from December 16 to January 15 for all 42 years.
例如,要计算 1 月 1 日的第 90 个百分位值,我必须选择一个以 1 月 1 日为中心的 30 天 window,即 12 月 16 日到 1 月 15 日所有 42 年的数据。 So, I would have 1260 (30*42) data points for each day.
所以,我每天会有 1260 (30*42) 个数据点。 I need the value for 366 days.
我需要 366 天的值。 I have 42-year daily datasets from 1980 to 2022, which look like this:
我有从 1980 年到 2022 年的 42 年每日数据集,如下所示:
date tmax tmin
1981-01-01 19.2 5.4
1981-01-02 18.2 5
1981-01-03 16.1 3.8
1981-01-04 17.2 4.4
1981-01-05 15.7 2.4
1981-01-06 15.6 5.4
1981-01-07 11.2 4.1
1981-01-08 14.8 -1
1981-01-09 15 0.8
1981-01-10 16.2 -0.4
.........................
.........................
.........................
2022-12-25 17.4 4.4
2022-12-26 16.5 4.1
2022-12-27 17 5.4
2022-12-28 15.2 3.6
2022-12-29 8.1 7.7
2022-12-30 13.5 6
2022-12-31 14.8 4.5
How can I do this in R?我如何在 R 中执行此操作? Initially, I thought it would be simple like this.
最初,我认为它会像这样简单。
temp_data <- read.csv("temperature.csv")
#as the date and tmax data are being read as characters by R
temp_data$tmax <- as.numeric(temp_data$tmax)
temp_data$date <- as.Date(temp_data$date, "%Y-%m-%d")
#Create a day of year variable for the day of the year
temp_data$doy <- as.numeric(format(temp_data$date,"%j"))
#load libraries
library(dplyr)
library(zoo)
temp_data_90th <- temp_data %>%
group_by(doy) %>%
summarize(rolling_90th = rollapply(tmax, width = 30, FUN = quantile, prob = 0.9, align = "center", na.rm=T))
But I don't think it gave the correct result since temp_data_90th has 4,470 rows with 13 data for each day of year.但我不认为它给出了正确的结果,因为 temp_data_90th 有 4,470 行,一年中的每一天都有 13 个数据。
Please can you suggest where I am doing wrong?请你能建议我哪里做错了吗? Thank you in advance for your support.
预先感谢您对我们的支持。
To illustrate this we will need reproducible data so use DF shown reproducibly in the Note at the end.为了说明这一点,我们将需要可重现的数据,因此请使用末尾注释中可重现显示的 DF。
Calculate yday which is the day of the year for each row of DF.计算 yday,它是 DF 每一行的一年中的第几天。 Then for each possible yday (0:365) get value in all rows whose yday is within 15 back to 14 forward of that yday modulo 366 and apply quantile to those values giving q90.
然后,对于每个可能的 yday (0:365),获取 yday 在 15 以内的所有行中的值,从该 yday 模 366 向前到 14,并将分位数应用于那些给出 q90 的值。
No packages are used.没有使用包。
yday <- as.POSIXlt(DF$date)$yday
q90 <- sapply(0:365, function(x)
quantile(DF$value[yday %in% (seq(x-15, x+14) %% 366)], prob = 0.9, na.rm = TRUE))
With rollapply it is slightly shorter.使用 rollapply 它会稍微短一些。 Using yday from above we have:
使用上面的 yday 我们有:
library(zoo)
q90 <- rollapply(seq(-15, 365 + 14) %% 366, 30, function(x)
quantile(DF$value[yday %in% x], prob = 0.9, na.rm = TRUE))
d <- seq(as.Date("2000-01-01"), as.Date("2022-12-31"), "day")
DF <- data.frame(date = d, value = seq_along(d))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.