Select interval of time series object by date and time

Question

My question is about how to manage the dates and times in an air quality database, which saved data every ten minutes all day, every day from 2002 through 2008.

I want to generate several analysis and plots, but referring only to the morning peak hours which go from 6:00 through 8:00 am I have tried to generate the diagrams in the needed interval but the R tool always plots the 24 hours in a day distorting, therefore, the available data for the peak hours.

I would hugely appreciate your guidance on how to select and plot interval in the peak hour only and how to generate the several diagrams.

I have the next script to generate a date interval, but I want to agregate hour interval (6-8 am) and plot only the interval data:

# select interval
start.date = as.POSIXct("2007-03-27 05:00", tz = "GMT")
end.date = as.POSIXct("2007-05-27 05:00", tz = "GMT")
subdata = subset(mydata, date >= start.date & date <= end.date,
select = c(date, nox, co))
#
#plot the variables

Answer 1

I recommend you use a time series class instead of a data.frame. Subsetting by a time interval each day is easy with xts:

# use DWin's example data
Data <- data.frame(a=rnorm(240),
  dtm=as.POSIXct("2007-03-27 05:00", tz="GMT")+3600*(1:240))
# create xts object
library(xts)
x <- xts(Data[,"a"], Data[,"dtm"])
# subset by time of day
y <- x["T06:00/T08:00"]
# plot
plot(y)  # plots all 24 hours of each day
# use chartSeries from quantmod to avoid above behavior
library(quantmod)
chartSeries(y)

Answer 2

If your date-times are in a column called 'dtm' then this code should get the records that are within the interval 6A to 8A

dfrm <- data.frame(a=rnorm(24),  
                   dtm =as.POSIXct("2007-03-27 05:00", tz='GMT') +3600*(1:24) )     
    sub6_8A <- subset(dfrm, strftime(dtm, "%H", tz="GMT") %in% c('06','07','08') )
sub6_8A
           a                 dtm
1  0.5020823 2007-03-27 06:00:00
2 -0.7455312 2007-03-27 07:00:00
3  1.8035086 2007-03-27 08:00:00

You could also use an indexed approach with "[[", but if you have NA's they would get dragged along unless you specifically excluded them.

Answer 3

If this was a data.frame, I would start by extracting the time of day for each entry into a new column and then tag each line with a "peak" flag, and then working with it becomes much easier. Ditto for day of week. Since there are only about 350k rows, this is going to be reasonably quick and it's a one-off, so you could do something ugly like:

# create some fake data
t1 <- as.POSIXct(paste('2012-06-16 0', 1:9, ':00', sep=''), tz='GMT')
N <- length(t1)
mydata <- data.frame(timestamp=t1, co=runif(N, 1,30), nox=runif(N, 5,50))

# extract out the hour of day
mydata$hour <- gsub('^.* ', '', as.character(t1))
# is this a peak time?
mydata$peak <- regexpr('^0[678]', mydata$hour) >0

Now you can easily select out only those records that are from peak hours - which will be a much smaller subset to graph - less than 50k records.

mypeakdata <- subset(mydata, peak)

As I'm sure you are going to be doing many such analyses with different hypotheses, I'd suggest that you add various columns such as hour of day, day of week etc. to your data.frame and leave them there, and just save this big data.frame like:

save(mydata, 'mydata_version_2012-06-16_8h58.RData')

Select interval of time series object by date and time

Question

3 answers

solution1
5 2012-06-16 12:07:46

solution2
3 2012-06-16 07:44:14

solution3
1 2012-06-16 07:38:12

Select interval of time series object by date and time

Question

3 answers

solution1 5 2012-06-16 12:07:46

solution2 3 2012-06-16 07:44:14

solution3 1 2012-06-16 07:38:12

solution1
5 2012-06-16 12:07:46

solution2
3 2012-06-16 07:44:14

solution3
1 2012-06-16 07:38:12