简体   繁体   English

在 R 中开始每日时间序列

[英]starting a daily time series in R

I have a daily time series about number of visitors on the web site.我有一个关于网站访问者数量的每日时间序列。 my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future.我的系列从01/06/2014开始到今天14/10/2015所以我想预测未来的访客数量。 How can I read my series with R?如何使用 R 阅读我的系列? I'm thinking:我在想:

series <- ts(visitors, frequency=365, start=c(2014, 6)) 

if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?如果是,并且在运行我的时间序列模型arimadata=auto.arima()我想预测接下来 6o 天的访客数量,我该怎么做?

h=..?
forecast(arimadata,h=..), 

the value of h shoud be what ? h的值应该是什么? thanks in advance for your help在此先感谢您的帮助

The ts specification is wrong; ts规范是错误的; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start :如果您将其设置为每日观察,那么您需要指定 2014 年的哪一天是 6 月 1 日,并在start指定:

## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")

## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)),     # random data
           start = c(2014, as.numeric(format(inds[1], "%j"))),
           frequency = 365)

Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))) .请注意,我将start指定为c(2014, as.numeric(format(inds[1], "%j"))) All the complicated bit is doing is working out what day of the year June 1st is:所有复杂的部分都在计算 6 月 1 日是一年中的哪一天:

> as.numeric(format(inds[1], "%j"))
[1] 152

Once you have this, you're effectively there:一旦你有了这个,你就有效地在那里:

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)

在此处输入图片说明

That seems suitable given the random data I supplied...鉴于我提供的随机数据,这似乎很合适......

You'll need to select appropriate arguments for auto.arima() as suits your data.您需要为auto.arima()选择适合您数据的参数。

Note that the x-axis labels refer to 0.5 (half) of a year.请注意,x 轴标签指的是一年的 0.5(半)。

Doing this via zoo通过动物园这样做

This might be easier to do via a zoo object created using the zoo package:通过使用zoo包创建的zoo对象可能更容易做到这一点:

## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)

Note you now don't need to specify any start or frequency info;请注意,您现在不需要指定任何startfrequency信息; just use inds computed earlier from the daily Date object.只需使用早先从每日Date对象计算的inds

Proceed as before像以前一样继续

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)

The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own.该图虽然会导致问题,因为 x 轴是自纪元 (1970-01-01) 以来的天数,因此我们需要抑制该轴的自动绘图,然后绘制我们自己的。 This is easy as we have inds这很容易,因为我们有inds

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1)

This only produces a couple of labeled ticks;这只会产生几个标记的刻度; if you want more control, tell R where you want the ticks and labels:如果你想要更多的控制,告诉 R 你想要刻度和标签的位置:

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1,
     at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
     format = "%b %Y")

Here we plot every 3 months.在这里,我们每 3 个月绘制一次。

Time Series Object does not work well with creating daily time series.时间序列对象不适用于创建每日时间序列。 I will suggest you use the zoo library.我建议你使用zoo图书馆。

library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))

Here's how I created a time series when I was given some daily observations with quite a few observations missing.下面是我如何创建一个时间序列,当我得到一些日常观察结果而缺少一些观察结果时。 @gavin-simpson gave quite a big help. @gavin-simpson 提供了很大帮助。 Hopefully this saves someone some grief.希望这可以避免一些人的悲伤。

The original data looked something like this:原始数据如下所示:

library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))

To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:为了使用这些数据创建时间序列,我创建了一个每个日期一行的“虚拟”数据框,并将其与现有数据框合并:

df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)

This dataframe can be cast into a timeseries.该数据帧可以转换为时间序列。 Missing dates are NA .缺少的日期是NA

nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)

有孔的正弦波

Here is a step by step guide to forecast daily data with multiple seasonality in R. Unless the time series is very long, the easiest approach is to simply set the frequency attribute to 7. 这是逐步指南,可预测R中具有多个季节性的每日数据。除非时间序列很长,否则最简单的方法是将频率属性设置为7。

y <- ts(x, frequency=7)

Then any of the usual time series forecasting methods should produce reasonable forecasts. 然后,任何常用的时间序列预测方法都应产生合理的预测。 For example 例如

library(forecast)
fit <- ets(y)
fc <- forecast(fit)
plot(fc)

When the time series is long enough to take in more than a year, then it may be necessary to allow for annual seasonality as well as weekly seasonality. 当时间序列足够长以可以使用一年以上时,则可能有必要考虑年度季节性和每周季节性。 In that case, a multiple seasonal model such as TBATS is required. 在这种情况下,需要使用多个季节模型,例如TBATS。

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)

This should capture the weekly pattern as well as the longer annual pattern. 这应该捕获每周模式以及更长的年度模式。 The period 365.25 is the average length of a year allowing for leap years. 365.25期间是允许leap年的一年的平均长度。 In some countries, alternative or additional year lengths may be necessary. 在某些国家/地区,可能需要其他或更长的年限。

Capturing seasonality associated with moving events such as Easter or the Chinese New Year is more difficult. 捕捉与复活节或农历新年等移动事件相关的季节性更加困难。 Even with monthly data, this can be tricky as the festivals can fall in either March or April (for Easter) or in January or February (for the Chinese New Year). 即使有月度数据,这也可能很棘手,因为节日可能是在3月或4月(复活节)或1月或2月(农历新年)。 The usual seasonal models don't allow for this. 通常的季节性模式不允许这样做。 The best way to deal with moving holiday effects is to use dummy variables. 处理假日效应的最好方法是使用虚拟变量。 However, neither ETS nor TBATS models allow for covariates. 但是,ETS和TBATS模型均不允许协变量。 A state space model of the same form as TBATS but with multiple sources of error and covariates could be used, but we don't have any R code to do that. 可以使用与TBATS相同形式的状态空间模型,但是可以使用多个误差源和协变量,但是我们没有R代码可以做到这一点。

Instead, We can use a regression model with ARIMA errors, where the regression terms include any dummy holiday effects as well as the longer annual seasonality. 相反,我们可以使用具有ARIMA错误的回归模型,其中回归项包括任何虚拟假日效应以及较长的年度季节性。 Unless there are many decades of data, it is usually reasonable to assume that the annual seasonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual seasonality. 除非有数十年的数据,否则通常合理的做法是假设年度季节性形状逐年保持不变,因此可以使用傅立叶项来模拟年度季节性。 Suppose we use K=5 Fourier terms to model annual seasonality, and that the holiday dummy variables are in the vector holiday with 100 future values in holidayf. 假设我们使用K = 5傅里叶项来模拟年度季节性,并且假的假变量位于vector假日中,而Holidayf中有100个未来值。 Then the following code will fit an appropriate model. 然后,以下代码将适合适当的模型。

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourierf(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order K can be chosen by minimizing the AIC of the fitted model. 可以通过最小化拟合模型的AIC来选择阶数K。

series <- ts(visitors, frequency=365, start=c(2014, 152)) 

152 号是 01-06-2014,因为它从 152 号开始,因为频率=365 要预测 60 天,h=60。

forecast(arimadata , h=60)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM