简体   繁体   中英

padr in R: padding at user-defined interval

I'm working with time series data at 5-minute time intervals. Some of the 5-minute time series are missing. I'd like to resample the dataset to fill in the missing 5-minute periods with NaN values. I found great information on how to approach this here: R: Insert rows for missing dates/times .

I've created a data.frame "df" with a POSIXct timeseries column "time".

The pad function in the padr package allows a user to set an interval by the minute, hour, day, etc.

interval
The interval of the returned datetime variable. When NULL the the interval >will be equal to the interval of the datetime variable. When specified it can >only be lower than the interval of the input data. See Details.

padr's pad function will create 1-minute intervals on my 5-minute data. How do I set my own user-defined interval (eg 5-minutes)?

Try using the function to pad to the minute then aggregate to the specification you'd like after. This then leads to a custom summary

library(padr)
account <- data.frame(day     = as.Date(c('2016-10-21', '2016-10-23', '2016-10-26')),
                      balance = c(304.46, 414.76, 378.98))

account %>% 
  pad('min') %>%   ##pad to the minute
  mutate(five_min = cut(day, "5 min")) %>%   ##create new 'five_min' column
  group_by(five_min) %>%     ## group by the new col
  summarise(ttl = sum(balance, na.rm=TRUE))  ##aggregate the new sum
# # A tibble: 1,441 × 2
#               five_min    ttl
#                 <fctr>  <dbl>
# 1  2016-10-21 00:00:00 304.46
# 2  2016-10-21 00:05:00   0.00
# 3  2016-10-21 00:10:00   0.00
# 4  2016-10-21 00:15:00   0.00
# 5  2016-10-21 00:20:00   0.00
# 6  2016-10-21 00:25:00   0.00
# 7  2016-10-21 00:30:00   0.00
# 8  2016-10-21 00:35:00   0.00
# 9  2016-10-21 00:40:00   0.00
# 10 2016-10-21 00:45:00   0.00
# # ... with 1,431 more rows

While I couldn't get Pierre's solution to run with my data format (which I didn't help in specifying), I was able to create a solution by employing Pierre's strategy in selecting a 5-minute subset of the padded 1-minute interval data. I'm excited about this new padr library and hope more functionality is added down the road.

My strategy was the following:

library(padr)
library(zoo)
dfpad <- pad(df, interval = "min") #resample timeseries df to 1 min intervals
dfpadzoo <- zoo(dfpad,order.by = dfpad$time) #convert padded df to zoo timeseries
sensStart <- start(dfpadzoo) #first time in data using zoo function
sensEnd <- end(dfpadzoo) # last time in data using zoo function
nexttime <- df$time[2] #identify the time in the second data row
#determine time interval in minutes:
tint_min <- as.double(difftime(nexttime,sensStart, tz="UTC",units="mins"))
#Generate regularly-spaced time series from the start to end of data:
timeFill <- seq(from = as.POSIXct(sensStart, tz="UTC"),
                to = as.POSIXct(sensEnd, tz="UTC"), by = 60*tint_min)
#Create subset of dfpad spaced at 5-minute intervals
sensdatazoo <- dfpadzoo[timeFill]

By converting the df to a zoo object, I was able to employ additional time series functionality found in the zoo library.

New version hit CRAN yesterday. You can now use units different from 1 in each of the intervals

library(padr)
library(dplyr)
coffee %>% thicken("5 min") %>% select(-time_stamp) %>% pad()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM