简体   繁体   中英

Define start and end within ts function in data.table groupby

I'm trying to transform a particular column of a dataset with daily samples of data for different devices into a time series column grouped by two keys (hour and factor1)

The data I have is something like this

         date    hour factor1 volume    wkday 
1: 2015-10-01     AM   11011    530  Thursday    
2: 2015-10-01     AM   11012   1535  Thursday    
3: 2015-10-01     AM   11021    191  Thursday    
4: 2015-10-01     AM   11131   1108  Thursday    
5: 2015-10-01     AM   11132   1518  Thursday    
6: 2015-10-01     AM   11141    508  Thursday    

date runs from 2015-10-01 to 2017-08-01, hour has two levels (AM and PM), factor1 has many levels and wkday is not needed so far. The column I want to turn into timeseries data is volume.

I tried do this:

table_11011 = table[factor1 == '11011']
table_11011_am = table_11011[hour == 'AM']

table_11011_am[, vol_ts := ts(table_11011_am[,volume],
                  start = decimal_date(table_11011_am[, date][1]),
                  frequency = 365)]

Thus I get the desired output but when I try to summarize this for all the different factor1 levels and hours I don't know how to input the correct start and end date. So far I managed to do this but it seems to give a bad output.

table[, vol_ts := ts(volume,
                   start = decimal_date(table[, date][1]), frequency = 365), by = c('factor1', 'hour')]

Any help would be appreciated!

Not sure if I 100% follow the intended usage here, but here's a stab at how I might approach a similar problem.

Basically, you can use seq.Date() to generate a regular series of dates, then use data.table's CJ() (cross join) function to repeat that series for each combination of your hours and factors.

Once you have a regular series, you can join in your raw data to get the regularly spaced data I think you're looking for. I've never really dealt with specialized time-series type objects in R, I've always been able to accomplish everything I need to with the data.table , zoo , and RcppRoll packages.

Hope this may be of some help.

library(data.table)

DT <- data.table(Date = as.Date(c("2015-10-01","2015-10-25","2015-11-04","2015-11-06")),
                 hour = c("AM","PM","AM","PM"),
                 factor1 = c("A","B","C","D"),
                 volume = c(1,2,3,4))

## Create a regular sequence of all dates in range 
## with a row for each combo of hour and factor1
TS <- CJ(Date = seq.Date(from = DT[,min(Date)], to = DT[,max(Date)],by = "day"),
         hour = DT[,unique(hour)],
         factor1 = DT[,unique(factor1)])

## Join the data to this expanded time series

setkey(DT,Date,hour,factor1)
setkey(TS,Date,hour,factor1)

TS <- DT[TS]

## Fill with zeros if necessary
TS[is.na(volume), volume := 0]

## If you want a separate column for factor level
Wide <- dcast(TS, ... ~ factor1, value.var = "volume")

## Or if you want a column (time series) for each combo
VeryWide <- dcast(TS, ... ~ factor1 + hour, value.var = "volume")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM