简体   繁体   English

按日期和时间选择时间序列对象的间隔

[英]Select interval of time series object by date and time

My question is about how to manage the dates and times in an air quality database, which saved data every ten minutes all day, every day from 2002 through 2008. 我的问题是如何管理空气质量数据库中的日期和时间,该数据库从2002年到2008年每天每隔十分钟保存一次数据。

I want to generate several analysis and plots, but referring only to the morning peak hours which go from 6:00 through 8:00 am I have tried to generate the diagrams in the needed interval but the R tool always plots the 24 hours in a day distorting, therefore, the available data for the peak hours. 我想生成几个分析和图表,但仅参考从早上6点到早上8点的早上高峰时段,我试图在所需的时间间隔内生成图表,但R工具总是在24小时内绘制图表。因此,歪曲了高峰时段的可用数据。

I would hugely appreciate your guidance on how to select and plot interval in the peak hour only and how to generate the several diagrams. 我非常感谢您就如何在高峰时段选择和绘制间隔以及如何生成几个图表的指导。

I have the next script to generate a date interval, but I want to agregate hour interval (6-8 am) and plot only the interval data: 我有下一个脚本来生成日期间隔,但我想要小时间隔(6-8 am)并仅绘制间隔数据:

# select interval
start.date = as.POSIXct("2007-03-27 05:00", tz = "GMT")
end.date = as.POSIXct("2007-05-27 05:00", tz = "GMT")
subdata = subset(mydata, date >= start.date & date <= end.date,
select = c(date, nox, co))
#
#plot the variables

I recommend you use a time series class instead of a data.frame. 我建议你使用时间序列类而不是data.frame。 Subsetting by a time interval each day is easy with xts: 使用xts可以轻松地按每天的时间间隔进行子集:

# use DWin's example data
Data <- data.frame(a=rnorm(240),
  dtm=as.POSIXct("2007-03-27 05:00", tz="GMT")+3600*(1:240))
# create xts object
library(xts)
x <- xts(Data[,"a"], Data[,"dtm"])
# subset by time of day
y <- x["T06:00/T08:00"]
# plot
plot(y)  # plots all 24 hours of each day
# use chartSeries from quantmod to avoid above behavior
library(quantmod)
chartSeries(y)

If your date-times are in a column called 'dtm' then this code should get the records that are within the interval 6A to 8A 如果您的日期时间位于名为“dtm”的列中,则此代码应获取6A至8A区间内的记录

dfrm <- data.frame(a=rnorm(24),  
                   dtm =as.POSIXct("2007-03-27 05:00", tz='GMT') +3600*(1:24) )     
    sub6_8A <- subset(dfrm, strftime(dtm, "%H", tz="GMT") %in% c('06','07','08') )
sub6_8A
           a                 dtm
1  0.5020823 2007-03-27 06:00:00
2 -0.7455312 2007-03-27 07:00:00
3  1.8035086 2007-03-27 08:00:00

You could also use an indexed approach with "[[", but if you have NA's they would get dragged along unless you specifically excluded them. 你也可以使用带有“[[”的索引方法,但是如果你有NA,它们会被拖延,除非你明确排除它们。

If this was a data.frame, I would start by extracting the time of day for each entry into a new column and then tag each line with a "peak" flag, and then working with it becomes much easier. 如果这是一个data.frame,我首先将每个条目的时间提取到一个新列中,然后用“peak”标记标记每一行,然后使用它变得更容易。 Ditto for day of week. 同样适用于星期几。 Since there are only about 350k rows, this is going to be reasonably quick and it's a one-off, so you could do something ugly like: 由于只有大约350k行,这将是相当快的,它是一次性的,所以你可以做一些丑陋的事情:

# create some fake data
t1 <- as.POSIXct(paste('2012-06-16 0', 1:9, ':00', sep=''), tz='GMT')
N <- length(t1)
mydata <- data.frame(timestamp=t1, co=runif(N, 1,30), nox=runif(N, 5,50))

# extract out the hour of day
mydata$hour <- gsub('^.* ', '', as.character(t1))
# is this a peak time?
mydata$peak <- regexpr('^0[678]', mydata$hour) >0

Now you can easily select out only those records that are from peak hours - which will be a much smaller subset to graph - less than 50k records. 现在,您可以轻松选择那些来自高峰时段的记录 - 这将是图表中小得多的子集 - 少于50k记录。

mypeakdata <- subset(mydata, peak)

As I'm sure you are going to be doing many such analyses with different hypotheses, I'd suggest that you add various columns such as hour of day, day of week etc. to your data.frame and leave them there, and just save this big data.frame like: 由于我确定你将使用不同的假设做很多这样的分析,我建议你在你的data.frame中添加各种列,例如一天中的小时,一周中的一天等,然后将它们留在那里,保存这个大数据。如:

save(mydata, 'mydata_version_2012-06-16_8h58.RData')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM