简体   繁体   English

R中条件选择的滚动百分位数

[英]rolling percentile for conditional selections in r

I have a data.frame with daily maximum and minimum temperatures for 40 years and need to select all days that have maximum temperature above 90th percentile of maximum temperature and minimum temperatures above the 85th percentile of minimum temperature. 我有一个data.frame,具有40年的每日最高和最低温度,并且需要选择所有最高温度高于最高温度90%且最低温度高于最低温度85%的天。

I was able to do that 我能够做到这一点

> head(df)
  YEAR MONTH DAY     Date MEAN  MAX  MIN
1 1965     1   1 1/1/1965   NA 27.0 17.0
2 1965     1   2 1/2/1965 24.0 28.0 20.7
3 1965     1   3 1/3/1965 19.9 23.7 16.2
4 1965     1   4 1/4/1965 18.0 23.4 12.0
5 1965     1   5 1/5/1965 19.7 24.0 14.0
6 1965     1   6 1/6/1965 18.6 24.0 13.0


df[, hotday := +(df$MAX>=(quantile(df$MAX,.90, na.rm = T, type = 6)) & df$MIN>=(quantile(df$MIN,.85, na.rm = T, type = 6)))
              ] [, length := with(rle(hotday), rep(lengths,lengths)) # to calculate lenght so I can select consecutive days only
                 ] [hotday==0, length:=0][!!hotday, Highest_Mean := max(MEAN) , rleid(length)][] # to find the highest Mean temp for each consecutive group

But I need to do the same thing using centered rolling percentiles for every 15 days (ie, for a given day, the 90th percentile of maximum temperature is the 90th percentile of the historical data for a 15-day window centered on that day) 但是我需要每15天使用居中的滚动百分位数做同样的事情(即,在给定的一天中,最高温度的90%就是该日居中15天的窗口的历史数据的90%)

I mean that the percentile to be calculated from the historical data of each calendar day using 15-days calendar window. 我的意思是要使用15天日历窗口根据每个日历日的历史数据计算百分比。 That is, there are 365 days so for day 118 I will use the historical data for day 111, 112,..... to day 125. So in my case, I have data for 40 years so the 15-day window will yield a total sample size of 40 years × 15 days = 600 for each calendar day. 也就是说,有365天,因此对于第118天,我将使用第111、112,.....天到第125天的历史数据。因此,就我而言,我有40年的数据,因此15天的窗口将每个日历日的总样本量为40年×15天= 600。 The moving window is based on the calendar day, not the time series 移动窗口基于日历日,而不是时间序列

Any thought please 有什么想法请

What about something like this to select the rows you want ? 这样选择您想要的行呢?

Since you want a sliding window of 15 days centered at the day of interest, you will always have windows of 7 preceding days + day of interest + 7 following days. 由于您希望以感兴趣的日期为中心的15天的滑动窗口,因此,您将始终拥有前7天+感兴趣的天+接下来7天的窗口。 Because of this constraint, the first 7 and the last 7 days (rows) of the dataset are excluded and forced == FALSE { rep(FALSE, 7) } 由于此限制,数据集的前7天和最后7天(行)被排除并强制== FALSE {rep(FALSE,7)}

the code included in the sapply() call will test each day (starting from day n.(7+1=8) ) against the 15-day sliding window (as defined before) and check if the max temperature is higher than the 90th percentile of that window (test1). sapply()调用中包含的代码将每天(从第n。(7 + 1 = 8)天开始)针对15天的滑动窗口(如前所述)进行测试,并检查最高温度是否高于第90个温度该窗口的百分比(test1)。 A similar test (test2) is executed looking at the MIN temp. 查看最低温度执行类似的测试(test2)。 If one of the two tests is TRUE, TRUE is returned (otherwise, FALSE is outputted. You can easily adapt this to your needs). 如果两个测试之一为TRUE,则返回TRUE(否则输出FALSE。您可以轻松地将其适应您的需求)。

The resulting vector (stored in the KEEP vector) includes booleans TRUE/FALSE that can be used for subsetting the initial dataframe. 结果向量(存储在KEEP向量中)包括布尔值TRUE / FALSE,可用于子集初始数据帧。

set.seed(111)
df <- data.frame(MIN=sample(50:70, size = 50, replace = T),
                 MAX=sample(70:90, size = 50, replace = T))
head(df)

KEEP <- c(rep(FALSE, 7),
          sapply(8:(length(df$MAX) - 7), (function(i){
            test1 <- df$MAX[i] >= as.numeric(quantile(df$MAX[(i-7):(i+7)], 0.9, na.rm = TRUE))
            test2 <- df$MIN[i] <= as.numeric(quantile(df$MIN[(i-7):(i+7)], 0.15, na.rm = TRUE))
            test1 | test2
          })),
          rep(FALSE, 7))
head(KEEP)
df <- df[KEEP,] 
df  

This should return 这应该返回

   MIN MAX
10  51  86
13  51  73
14  50  75
15  53  89
22  55  83
28  55  90
31  51  72
32  60  88
37  52  84
42  56  87

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM