简体   繁体   中英

How to convert this data into time series for arima model forecasting

 s
      X   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct Nov   Dec
1  2012 24.78 26.82 29.75 31.19 31.87 31.00 28.48 27.39 27.08 26.55 24.36 23.62
2  2013 24.82 26.04 28.83 30.85 32.44 29.70 27.86 27.66 27.73 27.00 24.87 22.94
3  2014 24.01 25.75 29.08 31.83 31.23 33.08 29.88 28.14 27.40 27.11 25.38 24.37
4  2015 24.60 26.11 29.19 30.71 32.69 28.90 29.21 28.24 27.58 27.82 25.37 24.71
5  2016 25.20 27.62 29.51 31.86 32.34 28.64 27.98 28.36 27.12 26.51 25.69 25.12
6  2017 25.28 26.88 29.55 31.88 32.74 29.89 28.41 27.60 27.72 27.23 25.43 24.08
7  2018 24.84 26.47 29.40 31.20 31.10 30.28 28.30 27.33 27.55 27.40 26.98 24.77
8  2019 23.73 26.75 29.57 31.59 32.53 31.30 29.48 27.78 27.54 27.05 25.44 24.46
9  2020 25.41 26.75 29.30 31.37 32.98 30.05 28.23 27.53 27.68 27.01 25.57 22.86
10 2021 24.70 25.90 29.62 31.42 31.68 30.17 28.13 28.08 27.68 27.29 25.59 23.16

How to convert this into time series for forecasting?

You can use the pivot_longer() function from the tidyr package to convert this into a longer format. Then the ts() function can covert it to a timeseries.

# recreate the original data
data1 <- structure(list(X=c(2012,2013,2014,2015,2016,2017,2018,2019,2020,2021),
               Jan=c(24.78,24.82,24.01,24.6,25.2,25.28,24.84,23.73,25.41,24.7),
               Feb=c(26.82,26.04,25.75,26.11,27.62,26.88,26.47,26.75,26.75,25.9),
               Mar=c(29.75,28.83,29.08,29.19,29.51,29.55,29.4,29.57,29.3,29.62),
               Apr=c(31.19,30.85,31.83,30.71,31.86,31.88,31.2,31.59,31.37,31.42),
               May=c(31.87,32.44,31.23,32.69,32.34,32.74,31.1,32.53,32.98,31.68),
               Jun=c(31,29.7,33.08,28.9,28.64,29.89,30.28,31.3,30.05,30.17),
               Jul=c(28.48,27.86,29.88,29.21,27.98,28.41,28.3,29.48,28.23,28.13),
               Aug=c(27.39,27.66,28.14,28.24,28.36,27.6,27.33,27.78,27.53,28.08),
               Sep=c(27.08,27.73,27.4,27.58,27.12,27.72,27.55,27.54,27.68,27.68),
               Oc=c(26.55,27,27.11,27.82,26.51,27.23,27.4,27.05,27.01,27.29),
               Nov=c(24.36,24.87,25.38,25.37,25.69,25.43,26.98,25.44,25.57,25.59),
               Dec=c(23.62,22.94,24.37,24.71,25.12,24.08,24.77,24.46,22.86,23.16)),
          row.names=c(NA,-10L),
          class=c("tbl_df","tbl","data.frame"))

# pivot to longer format
library(tidyr)
data2 <- pivot_longer(data1,-X,values_to='value')

# convert to monthly timeseries starting at Jan 2012 ending at Dec 2021
timeseries <- ts(data2$value,start=2012,end=2021+11/12,frequency=12)

We the question is how to convert a data frame in the form of the data shown in the Note at the end to a ts object. In particular we assume that the only NA's are at the beginning in case it does not start in January or at the end if it does not end in December.

No after removing the year column transpose it using t , unravel that into a vector using c and then specify the appropriate start year and frequency. Finally we assume that if it does not start in January that it starts with NA's so remove them with na.omit -- if we knew it starts in January and ends in December we could optionally remove the na.omit . No packages are used.

(If there are NA's at the beginning and/or end the above will continue to work but if there are also NA's internally then use na.trim from zoo in place of na.omit .)

na.omit(ts(c(t(s[, -1])), start = s[1, 1], frequency = 12))

Note

s <- structure(list(X = 2012:2021, Jan = c(24.78, 24.82, 24.01, 24.6, 
25.2, 25.28, 24.84, 23.73, 25.41, 24.7), Feb = c(26.82, 26.04, 
25.75, 26.11, 27.62, 26.88, 26.47, 26.75, 26.75, 25.9), Mar = c(29.75, 
28.83, 29.08, 29.19, 29.51, 29.55, 29.4, 29.57, 29.3, 29.62), 
    Apr = c(31.19, 30.85, 31.83, 30.71, 31.86, 31.88, 31.2, 31.59, 
    31.37, 31.42), May = c(31.87, 32.44, 31.23, 32.69, 32.34, 
    32.74, 31.1, 32.53, 32.98, 31.68), Jun = c(31, 29.7, 33.08, 
    28.9, 28.64, 29.89, 30.28, 31.3, 30.05, 30.17), Jul = c(28.48, 
    27.86, 29.88, 29.21, 27.98, 28.41, 28.3, 29.48, 28.23, 28.13
    ), Aug = c(27.39, 27.66, 28.14, 28.24, 28.36, 27.6, 27.33, 
    27.78, 27.53, 28.08), Sep = c(27.08, 27.73, 27.4, 27.58, 
    27.12, 27.72, 27.55, 27.54, 27.68, 27.68), Oct = c(26.55, 
    27, 27.11, 27.82, 26.51, 27.23, 27.4, 27.05, 27.01, 27.29
    ), Nov = c(24.36, 24.87, 25.38, 25.37, 25.69, 25.43, 26.98, 
    25.44, 25.57, 25.59), Dec = c(23.62, 22.94, 24.37, 24.71, 
    25.12, 24.08, 24.77, 24.46, 22.86, 23.16)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM