rolling percentile for conditional selections in r

Question

I have a data.frame with daily maximum and minimum temperatures for 40 years and need to select all days that have maximum temperature above 90th percentile of maximum temperature and minimum temperatures above the 85th percentile of minimum temperature.

I was able to do that

> head(df)
  YEAR MONTH DAY     Date MEAN  MAX  MIN
1 1965     1   1 1/1/1965   NA 27.0 17.0
2 1965     1   2 1/2/1965 24.0 28.0 20.7
3 1965     1   3 1/3/1965 19.9 23.7 16.2
4 1965     1   4 1/4/1965 18.0 23.4 12.0
5 1965     1   5 1/5/1965 19.7 24.0 14.0
6 1965     1   6 1/6/1965 18.6 24.0 13.0


df[, hotday := +(df$MAX>=(quantile(df$MAX,.90, na.rm = T, type = 6)) & df$MIN>=(quantile(df$MIN,.85, na.rm = T, type = 6)))
              ] [, length := with(rle(hotday), rep(lengths,lengths)) # to calculate lenght so I can select consecutive days only
                 ] [hotday==0, length:=0][!!hotday, Highest_Mean := max(MEAN) , rleid(length)][] # to find the highest Mean temp for each consecutive group

But I need to do the same thing using centered rolling percentiles for every 15 days (ie, for a given day, the 90th percentile of maximum temperature is the 90th percentile of the historical data for a 15-day window centered on that day)

I mean that the percentile to be calculated from the historical data of each calendar day using 15-days calendar window. That is, there are 365 days so for day 118 I will use the historical data for day 111, 112,..... to day 125. So in my case, I have data for 40 years so the 15-day window will yield a total sample size of 40 years × 15 days = 600 for each calendar day. The moving window is based on the calendar day, not the time series

Any thought please

Answer 1

What about something like this to select the rows you want ?

Since you want a sliding window of 15 days centered at the day of interest, you will always have windows of 7 preceding days + day of interest + 7 following days. Because of this constraint, the first 7 and the last 7 days (rows) of the dataset are excluded and forced == FALSE { rep(FALSE, 7) }

the code included in the sapply() call will test each day (starting from day n.(7+1=8) ) against the 15-day sliding window (as defined before) and check if the max temperature is higher than the 90th percentile of that window (test1). A similar test (test2) is executed looking at the MIN temp. If one of the two tests is TRUE, TRUE is returned (otherwise, FALSE is outputted. You can easily adapt this to your needs).

The resulting vector (stored in the KEEP vector) includes booleans TRUE/FALSE that can be used for subsetting the initial dataframe.

set.seed(111)
df <- data.frame(MIN=sample(50:70, size = 50, replace = T),
                 MAX=sample(70:90, size = 50, replace = T))
head(df)

KEEP <- c(rep(FALSE, 7),
          sapply(8:(length(df$MAX) - 7), (function(i){
            test1 <- df$MAX[i] >= as.numeric(quantile(df$MAX[(i-7):(i+7)], 0.9, na.rm = TRUE))
            test2 <- df$MIN[i] <= as.numeric(quantile(df$MIN[(i-7):(i+7)], 0.15, na.rm = TRUE))
            test1 | test2
          })),
          rep(FALSE, 7))
head(KEEP)
df <- df[KEEP,] 
df

This should return

rolling percentile for conditional selections in r

Question

1 answers

solution1
0 2017-08-24 21:39:24

rolling percentile for conditional selections in r

Question

1 answers

solution1 0 2017-08-24 21:39:24

solution1
0 2017-08-24 21:39:24