简体   繁体   中英

Using Time Series in R

I believe that using time-series in R has been discussed at length at Time series in R .

However, the dataset in above assumes a numeric array in all the SO posts and books I have read so far ( https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.pdf ). What if my data has categorical data as well? For instance,

> head(sassign)
  acctnum gender state   zip zip3 first last book_ nonbook_ total_ purch child youth cook do_it refernce
1   10001      M    NY 10605  106    49   29   109      248    357    10     3     2    2     0        1
2   10002      M    NY 10960  109    39   27    35      103    138     3     0     1    0     1        0
3   10003      F    PA 19146  191    19   15    25      147    172     2     0     0    2     0        0
4   10004      F    NJ 07016  070     7    7    15      257    272     1     0     0    0     0        1
5   10005      F    NY 10804  108    15   15    15      134    149     1     0     0    1     0        0
6   10006      F    NY 11366  113     7    7    15       98    113     1     0     1    0     0        0
  art geog buyer
1   0    2    no
2   0    1    no
3   0    0    no
4   0    0    no
5   0    0    no
6   0    0   yes

Now, here's what I did to create time-series object from above:--my objective is to group rows using "last" and then apply time-series type of object to "last" using sassign.

t_sassign <-data.frame(group_by(sassign,last))
t_sassign<-ts(t_sassign,start = c(2014,1),frequency = 12)

"Last" is the column indicating the last 'n' months since purchase. The above code works well except that the code is throwing warnings.

Warning message:
In data.matrix(data) : NAs introduced by coercion

Why is this happening? Please help me...My hypothesis is that I am getting NAs because R doesn't know how to group mixed data--grouping columns such as state (categorical) vs book_(continous). Am I correct?

However, if my hypothesis is correct, I am not quite sure how I can handle mixed data. Had it been all categorical, I would have used CrossTabs. Had it been all continous, I would have used functions such as sum, median etc. However, with mixed data, I am not quite sure.

I'd truly appreciate your thoughts.

No. "NA" is maybe because ts fails to convert character values of "gender", "state" and "buyer" to numeric. When they are factors, no warning message appear.

sassign = read.table(header = TRUE, text = "
acctnum gender state   zip zip3 first last book_ nonbook_ total_ purch child youth cook do_it refernce art geog buyer
1   10001      M    NY 10605  106    49   29   109      248    357    10     3     2    2     0        1 0    2    no
2   10002      M    NY 10960  109    39   27    35      103    138     3     0     1    0     1        0 0    1    no
3   10003      F    PA 19146  191    19   15    25      147    172     2     0     0    2     0        0 0    0    no
4   10004      F    NJ 07016  070     7    7    15      257    272     1     0     0    0     0        1 0    0    no
5   10005      F    NY 10804  108    15   15    15      134    149     1     0     0    1     0        0 0    0    no
6   10006      F    NY 11366  113     7    7    15       98    113     1     0     1    0     0        0 0    0   yes
");
t_sassign <-data.frame(group_by(sassign,last))
t_sassign<-ts(t_sassign,start = c(2014,1),frequency = 12)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM