简体   繁体   中英

Properly Creating a Time Series in R, auto.arima Function on Daily Data

I am creating a time series of daily sales of a given item at a retailer. I have several questions outlined below that I would like some help with (data and code to follow). Note that I am reading my actual data set from a csv file, where observations (dates) are in rows, and each variable are in the columns. Thank you ahead of time for your help, and please know that I am new to R coding.

1) It appears as if R is reading my time series by the observation number of the date (ie, April 5th, the 5th date in the data set, has a value of 5, rather than the 297 units that sold on that particular day). How can I remedy this?

2) I believe that my 'ts' statement is telling R that the data begins on the 91st day (April 1st) of 2013; have I coded this correctly? When I plot the data, it appears that R may be interpreting this statement differently.

3) Do I need to create a separate time series for my xreg? For example, should I create a time series for each variable, then take the union of those, and then cbind them?

4) Have I logged the variables in the correct statements, or should I do it elsewhere in the code?

require("forecast")
G<-read.csv("SingleItemToyDataset.csv")
GT<-ts(G$Units, start = c(2013, 91), frequency = 365.25)
X = cbind(log(G$Price), G$Time, as.factor(G$PromoOne), as.factor(G$PromoTwo), as.factor(G$Mon), as.factor(G$Tue), as.factor(G$Wed), as.factor(G$Thu), as.factor(G$Fri), as.factor(G$Sat))
Fit<-auto.arima(log(GT), xreg = X)


        Date Day Units Price Time PromoOne PromoTwo Mon Tue Wed Thu Fri Sat
1   4/1/2013 Mon   351  5.06    1        1        0   1   0   0   0   0   0
2   4/2/2013 Tue   753  4.90    2        1        0   0   1   0   0   0   0
3   4/3/2013 Wed   133  5.32    3        1        0   0   0   1   0   0   0
4   4/4/2013 Thu   150  5.14    4        1        0   0   0   0   1   0   0
5   4/5/2013 Fri   297  5.00    5        1        0   0   0   0   0   1   0
6   4/6/2013 Sat   688  5.27    6        1        0   0   0   0   0   0   1
7   4/7/2013 Sun 1,160  5.06    7        1        0   0   0   0   0   0   0
8   4/8/2013 Mon   613  5.07    8        1        0   1   0   0   0   0   0
9   4/9/2013 Tue   430  5.07    9        1        0   0   1   0   0   0   0
10 4/10/2013 Wed   400  5.03   10        1        0   0   0   1   0   0   0
11 4/11/2013 Thu 1,530  4.97   11        1        0   0   0   0   1   0   0
12 4/12/2013 Fri 2,119  5.00   12        0        1   0   0   0   0   1   0
13 4/13/2013 Sat 1,094  5.09   13        0        1   0   0   0   0   0   1
14 4/14/2013 Sun   736  5.02   14        1        0   0   0   0   0   0   0
15 4/15/2013 Mon   518  5.10   15        1        0   1   0   0   0   0   0
16 4/16/2013 Tue   485  5.02   16        1        0   0   1   0   0   0   0
17 4/17/2013 Wed   472  5.05   17        1        0   0   0   1   0   0   0
18 4/18/2013 Thu   406  5.03   18        1        0   0   0   0   1   0   0
19 4/19/2013 Fri   564  5.00   19        1        0   0   0   0   0   1   0
20 4/20/2013 Sat   475  5.09   20        1        0   0   0   0   0   0   1
21 4/21/2013 Sun   621  5.04   21        1        0   0   0   0   0   0   0
22 4/22/2013 Mon   714  5.02   22        1        0   1   0   0   0   0   0
23 4/23/2013 Tue 1,217  5.32   23        0        0   0   1   0   0   0   0
24 4/24/2013 Wed 1,253  5.45   24        0        0   0   0   1   0   0   0
25 4/25/2013 Thu 1,169  5.06   25        0        0   0   0   0   1   0   0
26 4/26/2013 Fri 1,216  5.01   26        0        0   0   0   0   0   1   0
27 4/27/2013 Sat 1,127  5.02   27        0        0   0   0   0   0   0   1
28 4/28/2013 Sun   693  5.04   28        1        0   0   0   0   0   0   0
29 4/29/2013 Mon   388  5.01   29        1        0   1   0   0   0   0   0
30 4/30/2013 Tue   305  5.01   30        1        0   0   1   0   0   0   0
31  5/1/2013 Wed   207  5.03   31        1        0   0   0   1   0   0   0
32  5/2/2013 Thu   612  4.97   32        1        0   0   0   0   1   0   0
33  5/3/2013 Fri   671  5.01   33        1        0   0   0   0   0   1   0
34  5/4/2013 Sat 1,151  5.04   34        1        0   0   0   0   0   0   1
35  5/5/2013 Sun 2,578  5.00   35        1        0   0   0   0   0   0   0
36  5/6/2013 Mon 2,364  5.01   36        1        0   1   0   0   0   0   0
37  5/7/2013 Tue   423  5.03   37        1        0   0   1   0   0   0   0
38  5/8/2013 Wed   388  5.04   38        1        0   0   0   1   0   0   0
39  5/9/2013 Thu 1,417  4.70   39        0        1   0   0   0   1   0   0
40 5/10/2013 Fri 1,607  4.59   40        0        1   0   0   0   0   1   0
41 5/11/2013 Sat 1,217  4.86   41        1        0   0   0   0   0   0   1
42 5/12/2013 Sun   545  5.12   42        1        0   0   0   0   0   0   0
43 5/13/2013 Mon   461  5.01   43        1        0   1   0   0   0   0   0
44 5/14/2013 Tue   358  4.97   44        1        0   0   1   0   0   0   0
45 5/15/2013 Wed   310  5.00   45        1        0   0   0   1   0   0   0
46 5/16/2013 Thu   925  4.63   46        1        0   0   0   0   1   0   0
47 5/17/2013 Fri   266  4.99   47        1        0   0   0   0   0   1   0
48 5/18/2013 Sat   183  5.15   48        0        0   0   0   0   0   0   1
49 5/19/2013 Sun   363  5.20   49        0        0   0   0   0   0   0   0
50 5/20/2013 Mon 5,469  4.99   50        1        0   1   0   0   0   0   0
51 5/21/2013 Tue   647  4.81   51        1        0   0   1   0   0   0   0
52 5/22/2013 Wed   421  4.97   52        1        0   0   0   1   0   0   0
53 5/23/2013 Thu   353  4.93   53        1        0   0   0   0   1   0   0
54 5/24/2013 Fri   375  4.95   54        1        0   0   0   0   0   1   0
55 5/25/2013 Sat   575  4.88   55        1        0   0   0   0   0   0   1
56 5/26/2013 Sun   707  4.92   56        0        0   0   0   0   0   0   0
57 5/27/2013 Mon   533  4.89   57        0        0   1   0   0   0   0   0
58 5/28/2013 Tue   641  4.66   58        0        0   0   1   0   0   0   0
59 5/29/2013 Wed   264  4.85   59        0        0   0   0   1   0   0   0
60 5/30/2013 Thu   186  5.74   60        1        0   0   0   0   1   0   0
61 5/31/2013 Fri   207  6.40   61        1        0   0   0   0   0   1   0

1) I'm not sure exactly what you mean here, but perhaps you are confused by the row names (numbers in this case) that R has assigned to your data frame G . Assuming the data.frame printed below your code is what G looks like, it looks to me like G$Units does indeed have the data you're interested in modeling (note, however, that R is perhaps treating G$Units as a character class because of the commas in the number; you should remove those from your .csv file).

2) For modeling with auto.arima() (or arima() in base R ), the univariate ts does not need to be an actual ts object. So, you don't really need to create GT . That said, however, The start and freq arguments to ts() can be a bit odd to figure out. In this case, you need to set freq=365 even though a year is technically a bit longer (ie, GT <- ts(G$Units, start=c(2013,91), freq=365) )

3) No, you do not need to create a separate time series for xreg . In fact, you don't need to create factors for your promos/days because they are already coded as 0/1. Thus, something like X <- G[,-c(1,2,3,5)]; X$Price <- log(X$Price) X <- G[,-c(1,2,3,5)]; X$Price <- log(X$Price) would suffice. (Aside: why are you using Time as a covariate; there doesn't appear to be any trend in the data?).

4) Yes, log-transforming the (co)variates where you did is fine, but I'm curious as to why the price covariate needs to be log-transformed?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM