简体   繁体   中英

starting a daily time series in R

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:

series <- ts(visitors, frequency=365, start=c(2014, 6)) 

if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?

h=..?
forecast(arimadata,h=..), 

the value of h shoud be what ? thanks in advance for your help

The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start :

## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")

## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)),     # random data
           start = c(2014, as.numeric(format(inds[1], "%j"))),
           frequency = 365)

Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))) . All the complicated bit is doing is working out what day of the year June 1st is:

> as.numeric(format(inds[1], "%j"))
[1] 152

Once you have this, you're effectively there:

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)

在此处输入图片说明

That seems suitable given the random data I supplied...

You'll need to select appropriate arguments for auto.arima() as suits your data.

Note that the x-axis labels refer to 0.5 (half) of a year.

Doing this via zoo

This might be easier to do via a zoo object created using the zoo package:

## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)

Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.

Proceed as before

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)

The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1)

This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1,
     at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
     format = "%b %Y")

Here we plot every 3 months.

Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.

library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))

Here's how I created a time series when I was given some daily observations with quite a few observations missing. @gavin-simpson gave quite a big help. Hopefully this saves someone some grief.

The original data looked something like this:

library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))

To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:

df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)

This dataframe can be cast into a timeseries. Missing dates are NA .

nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)

有孔的正弦波

Here is a step by step guide to forecast daily data with multiple seasonality in R. Unless the time series is very long, the easiest approach is to simply set the frequency attribute to 7.

y <- ts(x, frequency=7)

Then any of the usual time series forecasting methods should produce reasonable forecasts. For example

library(forecast)
fit <- ets(y)
fc <- forecast(fit)
plot(fc)

When the time series is long enough to take in more than a year, then it may be necessary to allow for annual seasonality as well as weekly seasonality. In that case, a multiple seasonal model such as TBATS is required.

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)

This should capture the weekly pattern as well as the longer annual pattern. The period 365.25 is the average length of a year allowing for leap years. In some countries, alternative or additional year lengths may be necessary.

Capturing seasonality associated with moving events such as Easter or the Chinese New Year is more difficult. Even with monthly data, this can be tricky as the festivals can fall in either March or April (for Easter) or in January or February (for the Chinese New Year). The usual seasonal models don't allow for this. The best way to deal with moving holiday effects is to use dummy variables. However, neither ETS nor TBATS models allow for covariates. A state space model of the same form as TBATS but with multiple sources of error and covariates could be used, but we don't have any R code to do that.

Instead, We can use a regression model with ARIMA errors, where the regression terms include any dummy holiday effects as well as the longer annual seasonality. Unless there are many decades of data, it is usually reasonable to assume that the annual seasonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual seasonality. Suppose we use K=5 Fourier terms to model annual seasonality, and that the holiday dummy variables are in the vector holiday with 100 future values in holidayf. Then the following code will fit an appropriate model.

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourierf(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order K can be chosen by minimizing the AIC of the fitted model.

series <- ts(visitors, frequency=365, start=c(2014, 152)) 

152 号是 01-06-2014,因为它从 152 号开始,因为频率=365 要预测 60 天,h=60。

forecast(arimadata , h=60)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM