简体   繁体   English

在R中绘制时间序列时的日期格式

[英]date format when plotting a time series in R

My data frame df is a daily time series with Datum and Opbrengst variables. 我的数据框df是包含DatumOpbrengst变量的每日时间序列。 The Datum variable is between 20160101 to 20170521 . Datum变量介于2016010120170521之间。

      Datum  Opbrengst
1   20160101  40609276
2   20160102  79381098
3   20160103 114653269
4   20160104 126044535
5   20160105 180472785
...

I want to do the prediction, so the first thing I do is plot the series to see if the series is stationary or not (if it has seasonality). 我想进行预测,所以我要做的第一件事是绘制序列,以查看序列是否稳定(如果具有季节性)。

However, the date variable is numeric , so when I plot the series, 但是,date变量是numeric ,所以当我绘制序列时,

 ggplot(data=df, aes(x=Datum , y=Opbrengst, group=1)) +
    geom_line()+
    geom_point()

it becomes like this: 它变成这样:

图形

The problem is that the series crosses years, that's why R just treats it as a numeric series , not time series . 问题在于该序列跨越了多年,这就是为什么R只是将其视为numeric series ,而不是time series

I tried to convert it to dates by using the method from this website 我试图通过使用此网站上的方法将其转换为日期

 df$Datum = as.Date(df$Datum)

but the result is incorrect: 但结果不正确:

 "57166-06-26" "57166-06-27" "57166-06-28" "57166-06-29" "57166-06-30" "57166-07-01"

My questions are: 我的问题是:

  1. How do I change the datum variable to the date format so that I won't have a problem when I plot the graph? 如何将datum变量更改为日期格式,以便在绘制图形时不会出现问题? Because later I will do need to do both daily and weekly predictions. 因为稍后我将需要做dailyweekly预测。

  2. I know if I use plot.ts() , then I don't need to change the time format. 我知道如果我使用plot.ts() ,那么我不需要更改时间格式。 Can I also do the time series plot in ggplot ? 我也可以在ggplot绘制时间序列图吗?

[edit] [编辑]

This is the a sample of the data: 这是数据示例:

df <- structure(list(Datum = 20160101:20160120, Opbrengst = c(40609276, 
79381098, 114653269, 126044535, 180472785, 169286880, 149272135, 
133645566, 70171285, 150029065, 149172032, 107843808, 138196732, 
136460905, 133595660, 61716435, 137309503, 193201850, 140766980, 
129859068)), .Names = c("Datum", "Opbrengst"), row.names = c(NA, 
20L), class = "data.frame")

[Edit] [编辑]

Changed %M to %m 已将%M更改为%m

There are many ways to do this. 有很多方法可以做到这一点。 Three easy ones: 三个简单的:

df <- structure(list(Datum = 20160101:20160120, Opbrengst = c(40609276, 79381098, 114653269, 126044535, 180472785, 169286880, 149272135, 133645566, 70171285, 150029065, 149172032, 107843808, 138196732, 136460905, 133595660, 61716435, 137309503, 193201850, 140766980, 129859068)), .Names = c("Datum", "Opbrengst"), row.names = c(NA, 20L), class = "data.frame")

# 1. Using the as.Date function (as sugges5ted by @SBista) to create a date object: 
df$Datum <- as.Date.character(df$Datum, format = "%Y %m %d")

# 2. Or create a POSIXct object:
# df$Datum <- strptime(df$Datum, format = "%Y %m %d")  

# 3. Using 'lubridate' to create a Date or POSIXct object (see 'tz' argument in ?ymd):
# df$Datum <- lubridate::ymd(df$Datum, tz = NULL)

ggplot(data=df, aes(x=Datum , y=Opbrengst)) +
  geom_line()+
  geom_point()

Results in: 结果是:

在此处输入图片说明

The problem with your example is that you weren't providing the 'format' argument, so R didn't know that it was year-month-day. 您的示例的问题在于您没有提供'format'参数,因此R不知道它是年月日。

The issue here is the conversion of df$Datum to class Date . 这里的问题是df$DatumDate类的转换。 It has nothing to do with ggplot2 ggplot2

Creating sample data as integer including New Year: 将样本数据创建为包括新年在内的integer

(Datum <- c(20151224:20151231, 20160101:20160107))
 [1] 20151224 20151225 20151226 20151227 20151228 20151229 20151230 20151231 20160101
[10] 20160102 20160103 20160104 20160105 20160106 20160107

anytime::anydate() and lubridate::ymd() seem to be able to convert integer Datum directly without coercion to character . anytime::anydate()lubridate::ymd()似乎能够直接将整数Datum转换而无需强制转换为character

anytime::anydate(Datum)
# [1] "2015-12-24" "2015-12-25" "2015-12-26" "2015-12-27" "2015-12-28" "2015-12-29"
# [7] "2015-12-30" "2015-12-31" "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04"
#[13] "2016-01-05" "2016-01-06" "2016-01-07"

lubridate::ymd(Datum)
# [1] "2015-12-24" "2015-12-25" "2015-12-26" "2015-12-27" "2015-12-28" "2015-12-29"
# [7] "2015-12-30" "2015-12-31" "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04"
#[13] "2016-01-05" "2016-01-06" "2016-01-07"

as.Date() throws errors here: as.Date()在此处引发错误:

as.Date(Datum)
#Error in as.Date.numeric(Datum) : 'origin' must be supplied

as.Date(Datum, "%Y%m%d")
#Error in charToDate(x) : 
#  character string is not in a standard unambiguous format

Datum needs to be coerced to character first: Datum必须强制为character首位:

as.Date(as.character(Datum), "%Y%m%d")
# [1] "2015-12-24" "2015-12-25" "2015-12-26" "2015-12-27" "2015-12-28" "2015-12-29"
# [7] "2015-12-30" "2015-12-31" "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04"
#[13] "2016-01-05" "2016-01-06" "2016-01-07"

Note that the format string is "%Y%m%d" with a lowercase m not "%Y%M%d" with a capital M . 需要注意的是格式字符串"%Y%m%d"以小写m 不是 "%Y%M%d"与资本M Interestingly, "%Y %m %d" with blanks interspersed seems to be working as well, here. 有趣的是,这里插入了带空格的"%Y %m %d"似乎也起作用。


Full example 完整的例子

# create data
df <- data.frame(
  Datum = c(20151220:20151231, 20160101:20160108),
  Opbrengst = c(40609276, 79381098, 114653269, 126044535, 180472785, 169286880, 
                149272135, 133645566, 70171285, 150029065, 149172032, 107843808, 
                138196732, 136460905, 133595660, 61716435, 137309503, 193201850, 
                140766980, 129859068))

# coerce to class Date
df$Datum <- anytime::anydate(df$Datum)

library(ggplot2)
ggplot(df, aes(Datum, Opbrengst)) + geom_line() + geom_point()

在此处输入图片说明

Note that the gap over New Year has gone. 请注意,新年的差距已经消失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM