简体   繁体   中英

Convert data frame to time series for prediction in R

I retrieve data from MySQL in the following format:

date         newCustomers
2016-07-27   31
2016-07-26   3

The data starts from date 2015-02-25 and there is an entry for each day. I want to convert this data frame to time series for forecasting purposes.

I tried the following: dataTimeSeries <- ts(data, start=c(2015,2,25), frequency=365.25) and it gave me a warning In data.matrix(data) : NAs introduced by coercion . On checking what's in dataTimeSeries, this is what I found

         date  day
2016.000   NA   31
2016.003   NA    3
2016.005   NA    2
2016.008   NA    0

What am I doing wrong, please point me in the right direction?

UPDATE : As suggested, I tried dataTimeSeries <- ts(data$newCustomers, start=c(2015,2,25), frequency=365.25) and it gave me the following result

Time Series:
Start = 2015.00273785079 
End = 2015.9993155373 
Frequency = 365.25 
  [1]   31    3    2    0  101   69    8    4   15    3    1   22   47   85  359    6    7    2  134   44   20   61    2    0    4 2373 4243    7   31   11    2    0   25 1689   24   74
 [37]   22    0    1  336  373   14   11  145    7    0    1   19   49  522   19    1   39 1611    9  675   21    1   45    4  156  180  747  265  169    0    0    4    7    3    4   10
 [73]   64    1    3    5    2   13   15    0    6    0   13    2   13   10    5   14   16   28  134    8    2    0    0    9   29    7   79   17    1    4  167    6   64  334   14    0
[109]    0   13   17   57   66    3    0    0   25    2    4   22   16    2    0   23   23  169 9912   24    8    3  154    3    2   29   29  243    0    6    2   72   66    7    1    0
[145]   24  208   13    6    7   10    4   54   79   72    9   29   31  208  224   18   50   65  152   50   10   55  107  249  178    3    0    0  627   19  220   20  285    0    1   11
[181]   26   25   88    9    2    7   64   54  212  295   37   49   19  144   30   78   29   97  210  143    4  294    2   34  642   24    0    0    1    4    0    0    0    0    0    0
[217]    2    3    9    0    0   62    6   16    0   12    0   21    3    6    5    8    1    1    0    3   40   16    1    0    0   66    0    0    1    8    6    1   14   26    4    4
[253]  285    4    0    0    0    3    1    0   28    0    0   24  360    0    0    2    3    0   11  294  578    1    4    0    0   19    2    7   10    0    0    1   20    1   59   19
[289]    2    0    0    9   19   12    4   10    5    4    5    5    7   38   10    5    6    9   18   22   30   28   13   14   22   22   35   12    6    3    3   15    3    3   28    1
[325]    0    0    7   45   21   14   21    0    0   22   14   17  799    7    0    3    8   20   21  107   75    3    3   39   36  137   42   39    6   16  113   11    6   10    8    6
[361]    6    8   21   12   81

which is not correct.

This should work, since you only need to feed the data (and not the times) to ts():

dataTimeSeries <- ts(data$newCustomers, ...)

It's also possible that your data doesn't have regularly spaced intervals between observations? Time series are best used for data sets with equally-spaced intervals between your observation dates. You can see Analyzing Daily/Weekly data using ts in R for other methods of analyzing data that doesn't necessarily have equally-spaced time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM