繁体   English   中英

如何在R中将每小时数据转换为时间序列

[英]How to convert hourly data into a time series in R

我有按日期排列的每小时数据,其dput如下所示:

    structure(list(trafficdate = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("2016-07-04", 
"2016-07-03", "2016-07-02", "2016-07-01", "2016-06-30", "2016-06-29", 
"2016-06-28", "2016-06-27", "2016-06-26", "2016-06-25", "2016-06-24", 
"2016-06-23", "2016-06-22", "2016-06-21", "2016-06-20", "2016-06-19", 
"2016-06-18", "2016-06-17", "2016-06-16", "2016-06-15", "2016-06-14", 
"2016-06-13", "2016-06-12", "2016-06-11", "2016-06-10", "2016-06-09", 
"2016-06-08", "2016-06-07", "2016-06-06"), class = "factor"), 
    days = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 
    7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
    7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L
    ), .Label = c("1", "2", "3", "4", "5", "6", "7"), class = "factor"), 
    hourofday = structure(c(15L, 14L, 13L, 12L, 11L, 10L, 9L, 
    8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 24L, 23L, 22L, 21L, 20L, 
    19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 
    7L, 6L, 5L, 4L, 3L, 2L, 1L, 24L, 23L, 22L, 21L, 20L, 19L, 
    18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 
    6L, 5L, 4L, 3L, 2L, 1L, 24L, 23L, 22L, 21L, 20L, 19L, 18L, 
    17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 
    4L, 3L, 2L, 1L, 24L, 23L, 22L, 21L, 20L, 19L, 18L, 17L, 16L, 
    15L, 14L, 13L, 12L), .Label = c("0", "1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
    "16", "17", "18", "19", "20", "21", "22", "23"), class = "factor"), 
    imps = c(22449L, 44921L, 38551L, 33060L, 28389L, 21660L, 
    13555L, 7648L, 3309L, 1545L, 1201L, 1392L, 2039L, 3829L, 
    9282L, 22813L, 42256L, 62132L, 63919L, 56453L, 50110L, 47200L, 
    43093L, 37191L, 34383L, 32337L, 32126L, 30801L, 27764L, 23211L, 
    15409L, 7220L, 2895L, 1277L, 1246L, 1341L, 2255L, 4332L, 
    10121L, 20168L, 30149L, 41926L, 40691L, 36107L, 34296L, 35386L, 
    34835L, 32610L, 30374L, 28032L, 26973L, 26479L, 23865L, 20225L, 
    14656L, 8265L, 3468L, 1621L, 1134L, 1429L, 2114L, 4106L, 
    9292L, 20356L, 30563L, 37327L, 39601L, 35267L, 32680L, 32004L, 
    33824L, 31531L, 30743L, 29922L, 26789L, 25735L, 22745L, 18612L, 
    12459L, 7528L, 4000L, 1803L, 1219L, 1523L, 2429L, 3897L, 
    8603L, 19822L, 35675L, 49282L, 46619L, 38847L, 31123L, 31114L, 
    33556L, 31286L, 31837L, 34010L, 30823L, 28219L), clicks = c(1152L, 
    2327L, 2076L, 1591L, 1429L, 1088L, 573L, 387L, 154L, 82L, 
    65L, 85L, 119L, 218L, 476L, 1224L, 2326L, 3476L, 3667L, 3003L, 
    2675L, 2572L, 2270L, 1902L, 1835L, 1652L, 1641L, 1552L, 1418L, 
    1235L, 896L, 439L, 177L, 68L, 74L, 78L, 151L, 220L, 519L, 
    1049L, 1528L, 2210L, 2210L, 1965L, 1702L, 1733L, 1756L, 1627L, 
    1422L, 1406L, 1311L, 1192L, 1190L, 1052L, 717L, 434L, 183L, 
    98L, 67L, 66L, 114L, 205L, 491L, 986L, 1450L, 1828L, 2000L, 
    1618L, 1507L, 1514L, 1523L, 1503L, 1451L, 1527L, 1284L, 1151L, 
    1080L, 853L, 596L, 367L, 241L, 88L, 61L, 52L, 122L, 218L, 
    394L, 927L, 1732L, 2483L, 2296L, 1904L, 1686L, 1473L, 1572L, 
    1514L, 1532L, 1681L, 1547L, 1404L), cost = c(994.58, 2022.92, 
    1813, 1381.55, 1242.87, 948.33, 489.06, 336.46, 134.6, 71.34, 
    55.9, 72.56, 100.9, 183.32, 409.89, 979.73, 1919.3, 2881.13, 
    3134.75, 2548.11, 2259.19, 2211.79, 1951.18, 1628.12, 1573.57, 
    1437.47, 1433.4, 1333.47, 1223.9, 1060.35, 758.45, 377.07, 
    150.29, 58.76, 64.27, 68.38, 129.65, 180.64, 426.02, 848.24, 
    1283.77, 1862.09, 1871.18, 1687.33, 1467.3, 1494.57, 1499.41, 
    1409.43, 1224.21, 1221.53, 1134.31, 1047.37, 1034.99, 901.76, 
    620.62, 369.46, 150.13, 80.53, 57.22, 58.98, 99.64, 175.03, 
    421.26, 819.02, 1232.48, 1562.16, 1719.99, 1383.76, 1285.62, 
    1300.93, 1312.65, 1288.82, 1272.01, 1342.58, 1127.99, 994.88, 
    928.75, 746.2, 512.98, 324.93, 212.81, 76.38, 53.37, 46.86, 
    102.58, 185.8, 340.84, 772.18, 1483.06, 2122.35, 1993.3, 
    1662.02, 1460.08, 1280.83, 1358.43, 1306.82, 1341.35, 1467.66, 
    1342.47, 1218.92), AveCPC = c(0.86, 0.87, 0.87, 0.87, 0.87, 
    0.87, 0.85, 0.87, 0.87, 0.87, 0.86, 0.85, 0.85, 0.84, 0.86, 
    0.8, 0.83, 0.83, 0.85, 0.85, 0.84, 0.86, 0.86, 0.86, 0.86, 
    0.87, 0.87, 0.86, 0.86, 0.86, 0.85, 0.86, 0.85, 0.86, 0.87, 
    0.88, 0.86, 0.82, 0.82, 0.81, 0.84, 0.84, 0.85, 0.86, 0.86, 
    0.86, 0.85, 0.87, 0.86, 0.87, 0.87, 0.88, 0.87, 0.86, 0.87, 
    0.85, 0.82, 0.82, 0.85, 0.89, 0.87, 0.85, 0.86, 0.83, 0.85, 
    0.85, 0.86, 0.86, 0.85, 0.86, 0.86, 0.86, 0.88, 0.88, 0.88, 
    0.86, 0.86, 0.87, 0.86, 0.89, 0.88, 0.87, 0.87, 0.9, 0.84, 
    0.85, 0.87, 0.83, 0.86, 0.85, 0.87, 0.87, 0.87, 0.87, 0.86, 
    0.86, 0.88, 0.87, 0.87, 0.87), AvePos = c(2.98, 2.97, 3.07, 
    3.03, 3.1, 3.11, 3.06, 2.96, 2.88, 2.74, 2.78, 2.85, 2.71, 
    2.9, 2.76, 2.64, 2.72, 2.78, 2.8, 2.9, 3.01, 3.08, 3.07, 
    3.09, 3.1, 3.01, 2.99, 3, 2.99, 3.12, 2.94, 3.08, 2.84, 2.62, 
    2.77, 2.69, 2.6, 2.75, 2.85, 2.7, 2.77, 2.75, 2.88, 3.03, 
    2.97, 3.04, 3.13, 3.08, 3.11, 3.05, 3.09, 3.11, 3.12, 3.11, 
    3.06, 3.04, 2.95, 2.62, 2.63, 2.75, 2.81, 2.78, 2.67, 2.6, 
    2.83, 2.84, 2.9, 2.89, 2.87, 2.97, 2.94, 2.98, 2.98, 3, 3.02, 
    3.09, 3.01, 3.06, 2.99, 3.03, 2.81, 2.7, 2.69, 2.75, 2.8, 
    2.6, 2.72, 2.55, 2.8, 2.83, 2.95, 2.91, 3.05, 3.07, 3.07, 
    2.97, 3, 3, 3.12, 2.99)), .Names = c("trafficdate", "days", 
"hourofday", "imps", "clicks", "cost", "AveCPC", "AvePos"), class = c("data.table", 
"data.frame"), row.names = c(NA, -100L), .internal.selfref = <pointer: 0x00000000013b0788>)

我想使用多种方法对该时间序列进行每小时预测。 我想问一下如何最好地将其转换为时间序列。 我尝试将(frequency = 24)与“ ts”方法和“ xts”方法一起使用,但无法确定x轴值。 开始日期和时间是2016-06-06 00:00,结束日期和时间是2016-07-04 14:00。

我尝试了xts(df$values, order.by = df$$date_and_time)其中date_and_time是有序的。 时间序列图看起来不错,但使用auto.arima和指数平滑时,预测值完全超出范围,为负值。

当我在frequency = 24使用ts函数时,预测效果更好,但是在绘制时间序列时我无法理解在x轴上绘制的值。 我正在尝试一次预测多个变量。 要预测的变量是费用,点击次数和展示次数。

谢谢。

我认为就小时时间序列数据而言,您应该坚持使用xts而不是ts。 因为xts可以更好地处理每小时数据,并且是Zoo对象的子类。

我有一个类似的用例,并且将时间戳分为几个小时:

library(lubridate)  
train$hour <- as.factor(hour(train$timestamp)) 

这就是我在x轴上绘制小时数的方式,它给出了更好的解释。 您可以根据用户使用情况进一步拆分时间戳。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM