I'm trying to transform a particular column of a dataset with daily samples of data for different devices into a time series column grouped by two keys (hour and factor1)
The data I have is something like this
date hour factor1 volume wkday
1: 2015-10-01 AM 11011 530 Thursday
2: 2015-10-01 AM 11012 1535 Thursday
3: 2015-10-01 AM 11021 191 Thursday
4: 2015-10-01 AM 11131 1108 Thursday
5: 2015-10-01 AM 11132 1518 Thursday
6: 2015-10-01 AM 11141 508 Thursday
date runs from 2015-10-01 to 2017-08-01, hour has two levels (AM and PM), factor1 has many levels and wkday is not needed so far. The column I want to turn into timeseries data is volume.
I tried do this:
table_11011 = table[factor1 == '11011']
table_11011_am = table_11011[hour == 'AM']
table_11011_am[, vol_ts := ts(table_11011_am[,volume],
start = decimal_date(table_11011_am[, date][1]),
frequency = 365)]
Thus I get the desired output but when I try to summarize this for all the different factor1 levels and hours I don't know how to input the correct start and end date. So far I managed to do this but it seems to give a bad output.
table[, vol_ts := ts(volume,
start = decimal_date(table[, date][1]), frequency = 365), by = c('factor1', 'hour')]
Any help would be appreciated!
Not sure if I 100% follow the intended usage here, but here's a stab at how I might approach a similar problem.
Basically, you can use seq.Date()
to generate a regular series of dates, then use data.table's CJ()
(cross join) function to repeat that series for each combination of your hours and factors.
Once you have a regular series, you can join in your raw data to get the regularly spaced data I think you're looking for. I've never really dealt with specialized time-series type objects in R, I've always been able to accomplish everything I need to with the data.table
, zoo
, and RcppRoll
packages.
Hope this may be of some help.
library(data.table)
DT <- data.table(Date = as.Date(c("2015-10-01","2015-10-25","2015-11-04","2015-11-06")),
hour = c("AM","PM","AM","PM"),
factor1 = c("A","B","C","D"),
volume = c(1,2,3,4))
## Create a regular sequence of all dates in range
## with a row for each combo of hour and factor1
TS <- CJ(Date = seq.Date(from = DT[,min(Date)], to = DT[,max(Date)],by = "day"),
hour = DT[,unique(hour)],
factor1 = DT[,unique(factor1)])
## Join the data to this expanded time series
setkey(DT,Date,hour,factor1)
setkey(TS,Date,hour,factor1)
TS <- DT[TS]
## Fill with zeros if necessary
TS[is.na(volume), volume := 0]
## If you want a separate column for factor level
Wide <- dcast(TS, ... ~ factor1, value.var = "volume")
## Or if you want a column (time series) for each combo
VeryWide <- dcast(TS, ... ~ factor1 + hour, value.var = "volume")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.