简体   繁体   English

将日期转换为时间序列的月/年格式

[英]Convert date to month/year format for time series

I have some have some water quality sample data. 我有一些水质样本数据。

> dput(GrowingArealog90s[1:10,])
structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516, 
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 =  c(1.51851393987789, 
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207, 
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931, 
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -10L))

This data is collected monthly, although some months are missed over the 25 year period. 此数据每月收集一次,尽管在25年的时间段内会漏掉一些月份。

I know there is so much help out there for converting dates to different formats but I have not been able to figure this out. 我知道在将日期转换为不同格式方面有很多帮助,但我一直无法弄清楚。 I want to create a time series with just a month/year format, so that I can do things like decompose the data by month and run seasonal kendalls and such. 我想创建一个仅以月/年格式的时间序列,以便执行诸如按月分解数据并运行季节性kendalls之类的事情。 I have tried so many different ways of converting my date to the desired format that I have completely confused myself. 我尝试了很多不同的方法将日期转换为所需的格式,以至于我完全感到困惑。 I don't care about the exact format as long as it is recognized month/year. 我不关心确切的格式,只要它可以被识别为月/年。

I also need to fill in the missing months with NAs. 我还需要用NA填补缺失的月份。

I tried uploading the "SampleDate" column in a numeric format, "yyyymm". 我尝试以数字格式“ yyyymm”上传“ SampleDate”列。 I could then merge that data frame with another that contained all the dates I need. 然后,我可以将该数据框与另一个包含我需要的所有日期的数据框合并。

GA90 <- merge(Dates, GrowingArealog90s, by.x = "Date", by.y = "Date", all.x = TRUE)

However, when I converted the resulting data frame to a time series it would not recognize the 12 month frequency. 但是,当我将结果数据帧转换为时间序列时,它将无法识别12个月的频率。

 GA90ts <- as.ts(GA90, frequency(12))

> GA90ts
Time Series:
Start = 1 
End = 324 
Frequency = 1 

Any help with this is appreciated. 任何帮助,不胜感激。

Here's how to do it with zoo . 这是与zoo一起做的方法。 You'll get a warning, but it's OK for now. 您会收到警告,但现在还可以。 You'll get a series with mon/yy. 您将获得与mon / yy的系列。

series <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 =  c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))

library(zoo)
series <-as.data.frame(series) #to drop dplyr class
series.zoo <-zoo(series[,-1,drop=FALSE],as.yearmon(series[,1]))

Best practice would be to keep your series with actual date and use as.yearmon or as.yearmon only when you actually need to make calculations or aggregate.zoo by month and year. 最佳做法是仅当您实际需要按月和年进行计算或aggregate.zoo ,才使序列具有实际日期,并仅使用as.yearmonas.yearmon

The following is a matter of taste, but I've dealt with a lot of time series and I think zoo is superior to ts and xts . 以下是一个趣味性问题,但是我已经处理了许多时间序列,我认为zoo优于tsxts Much more flexible. 更加灵活。

Now, to fill in missing values, you have to create a vector of dates. 现在,要填写缺失值,您必须创建一个日期向量。 Here, I'm using a zoo object with actual dates. 在这里,我使用的是带有实际日期的zoo对象。 I then use na.locf , which is "last observation carry forward". 然后,我使用na.locf ,这是“最后的观察结转”。 You could also look at na.approx . 您还可以查看na.approx

series.zoo <-zoo(series[,-1,drop=FALSE],(series[,1]))
my.seq <-seq.Date(first(series[,1,drop=FALSE]), last(series[,1,drop=FALSE]),by="month")
merged <-merge.zoo(series.zoo,zoo(,my.seq))
na.locf(merged)

UPDATE 更新

With aggregate. 随着聚合。

GrowingArealog90s <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 =  c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))

library(zoo);library(xts)
GrowingArealog90s <-as.data.frame(GrowingArealog90s) #to remove dplyr format
GrowingArealog90s.zoo <-zoo(GrowingArealog90s[,-1,drop=FALSE],as.Date(GrowingArealog90s[,1]))

#First aggregate by month. I chose to get the mean per month
GrowingArealog90s.agg <-aggregate(GrowingArealog90s.zoo, as.yearmon, mean) #replace mean with last to get last reading of the month

#Then create a sequence of months and merge it
my.seq <-seq.Date(first(GrowingArealog90s[,1]), last(GrowingArealog90s[,1]),by="month")
merged <-merge.zoo(GrowingArealog90s.agg ,zoo(,as.yearmon(my.seq)))
na.locf(merged)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM