I am creating a time series of daily sales of a given item at a retailer. I have several questions outlined below that I would like some help with (data and code to follow). Note that I am reading my actual data set from a csv file, where observations (dates) are in rows, and each variable are in the columns. Thank you ahead of time for your help, and please know that I am new to R coding.
1) It appears as if R is reading my time series by the observation number of the date (ie, April 5th, the 5th date in the data set, has a value of 5, rather than the 297 units that sold on that particular day). How can I remedy this?
2) I believe that my 'ts' statement is telling R that the data begins on the 91st day (April 1st) of 2013; have I coded this correctly? When I plot the data, it appears that R may be interpreting this statement differently.
3) Do I need to create a separate time series for my xreg? For example, should I create a time series for each variable, then take the union of those, and then cbind them?
4) Have I logged the variables in the correct statements, or should I do it elsewhere in the code?
require("forecast")
G<-read.csv("SingleItemToyDataset.csv")
GT<-ts(G$Units, start = c(2013, 91), frequency = 365.25)
X = cbind(log(G$Price), G$Time, as.factor(G$PromoOne), as.factor(G$PromoTwo), as.factor(G$Mon), as.factor(G$Tue), as.factor(G$Wed), as.factor(G$Thu), as.factor(G$Fri), as.factor(G$Sat))
Fit<-auto.arima(log(GT), xreg = X)
Date Day Units Price Time PromoOne PromoTwo Mon Tue Wed Thu Fri Sat
1 4/1/2013 Mon 351 5.06 1 1 0 1 0 0 0 0 0
2 4/2/2013 Tue 753 4.90 2 1 0 0 1 0 0 0 0
3 4/3/2013 Wed 133 5.32 3 1 0 0 0 1 0 0 0
4 4/4/2013 Thu 150 5.14 4 1 0 0 0 0 1 0 0
5 4/5/2013 Fri 297 5.00 5 1 0 0 0 0 0 1 0
6 4/6/2013 Sat 688 5.27 6 1 0 0 0 0 0 0 1
7 4/7/2013 Sun 1,160 5.06 7 1 0 0 0 0 0 0 0
8 4/8/2013 Mon 613 5.07 8 1 0 1 0 0 0 0 0
9 4/9/2013 Tue 430 5.07 9 1 0 0 1 0 0 0 0
10 4/10/2013 Wed 400 5.03 10 1 0 0 0 1 0 0 0
11 4/11/2013 Thu 1,530 4.97 11 1 0 0 0 0 1 0 0
12 4/12/2013 Fri 2,119 5.00 12 0 1 0 0 0 0 1 0
13 4/13/2013 Sat 1,094 5.09 13 0 1 0 0 0 0 0 1
14 4/14/2013 Sun 736 5.02 14 1 0 0 0 0 0 0 0
15 4/15/2013 Mon 518 5.10 15 1 0 1 0 0 0 0 0
16 4/16/2013 Tue 485 5.02 16 1 0 0 1 0 0 0 0
17 4/17/2013 Wed 472 5.05 17 1 0 0 0 1 0 0 0
18 4/18/2013 Thu 406 5.03 18 1 0 0 0 0 1 0 0
19 4/19/2013 Fri 564 5.00 19 1 0 0 0 0 0 1 0
20 4/20/2013 Sat 475 5.09 20 1 0 0 0 0 0 0 1
21 4/21/2013 Sun 621 5.04 21 1 0 0 0 0 0 0 0
22 4/22/2013 Mon 714 5.02 22 1 0 1 0 0 0 0 0
23 4/23/2013 Tue 1,217 5.32 23 0 0 0 1 0 0 0 0
24 4/24/2013 Wed 1,253 5.45 24 0 0 0 0 1 0 0 0
25 4/25/2013 Thu 1,169 5.06 25 0 0 0 0 0 1 0 0
26 4/26/2013 Fri 1,216 5.01 26 0 0 0 0 0 0 1 0
27 4/27/2013 Sat 1,127 5.02 27 0 0 0 0 0 0 0 1
28 4/28/2013 Sun 693 5.04 28 1 0 0 0 0 0 0 0
29 4/29/2013 Mon 388 5.01 29 1 0 1 0 0 0 0 0
30 4/30/2013 Tue 305 5.01 30 1 0 0 1 0 0 0 0
31 5/1/2013 Wed 207 5.03 31 1 0 0 0 1 0 0 0
32 5/2/2013 Thu 612 4.97 32 1 0 0 0 0 1 0 0
33 5/3/2013 Fri 671 5.01 33 1 0 0 0 0 0 1 0
34 5/4/2013 Sat 1,151 5.04 34 1 0 0 0 0 0 0 1
35 5/5/2013 Sun 2,578 5.00 35 1 0 0 0 0 0 0 0
36 5/6/2013 Mon 2,364 5.01 36 1 0 1 0 0 0 0 0
37 5/7/2013 Tue 423 5.03 37 1 0 0 1 0 0 0 0
38 5/8/2013 Wed 388 5.04 38 1 0 0 0 1 0 0 0
39 5/9/2013 Thu 1,417 4.70 39 0 1 0 0 0 1 0 0
40 5/10/2013 Fri 1,607 4.59 40 0 1 0 0 0 0 1 0
41 5/11/2013 Sat 1,217 4.86 41 1 0 0 0 0 0 0 1
42 5/12/2013 Sun 545 5.12 42 1 0 0 0 0 0 0 0
43 5/13/2013 Mon 461 5.01 43 1 0 1 0 0 0 0 0
44 5/14/2013 Tue 358 4.97 44 1 0 0 1 0 0 0 0
45 5/15/2013 Wed 310 5.00 45 1 0 0 0 1 0 0 0
46 5/16/2013 Thu 925 4.63 46 1 0 0 0 0 1 0 0
47 5/17/2013 Fri 266 4.99 47 1 0 0 0 0 0 1 0
48 5/18/2013 Sat 183 5.15 48 0 0 0 0 0 0 0 1
49 5/19/2013 Sun 363 5.20 49 0 0 0 0 0 0 0 0
50 5/20/2013 Mon 5,469 4.99 50 1 0 1 0 0 0 0 0
51 5/21/2013 Tue 647 4.81 51 1 0 0 1 0 0 0 0
52 5/22/2013 Wed 421 4.97 52 1 0 0 0 1 0 0 0
53 5/23/2013 Thu 353 4.93 53 1 0 0 0 0 1 0 0
54 5/24/2013 Fri 375 4.95 54 1 0 0 0 0 0 1 0
55 5/25/2013 Sat 575 4.88 55 1 0 0 0 0 0 0 1
56 5/26/2013 Sun 707 4.92 56 0 0 0 0 0 0 0 0
57 5/27/2013 Mon 533 4.89 57 0 0 1 0 0 0 0 0
58 5/28/2013 Tue 641 4.66 58 0 0 0 1 0 0 0 0
59 5/29/2013 Wed 264 4.85 59 0 0 0 0 1 0 0 0
60 5/30/2013 Thu 186 5.74 60 1 0 0 0 0 1 0 0
61 5/31/2013 Fri 207 6.40 61 1 0 0 0 0 0 1 0
1) I'm not sure exactly what you mean here, but perhaps you are confused by the row names (numbers in this case) that R has assigned to your data frame G
. Assuming the data.frame printed below your code is what G
looks like, it looks to me like G$Units does indeed have the data you're interested in modeling (note, however, that R
is perhaps treating G$Units
as a character class because of the commas in the number; you should remove those from your .csv file).
2) For modeling with auto.arima()
(or arima()
in base R
), the univariate ts does not need to be an actual ts
object. So, you don't really need to create GT
. That said, however, The start
and freq
arguments to ts()
can be a bit odd to figure out. In this case, you need to set freq=365
even though a year is technically a bit longer (ie, GT <- ts(G$Units, start=c(2013,91), freq=365)
)
3) No, you do not need to create a separate time series for xreg
. In fact, you don't need to create factors for your promos/days because they are already coded as 0/1. Thus, something like X <- G[,-c(1,2,3,5)]; X$Price <- log(X$Price)
X <- G[,-c(1,2,3,5)]; X$Price <- log(X$Price)
would suffice. (Aside: why are you using Time
as a covariate; there doesn't appear to be any trend in the data?).
4) Yes, log-transforming the (co)variates where you did is fine, but I'm curious as to why the price covariate needs to be log-transformed?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.