简体   繁体   English

时间序列分析的适用性?

[英]Time series analysis applicability?

I have a sample data frame like this (date column format is mm-dd-YYYY ): 我有一个像这样的示例数据框(日期列格式为mm-dd-YYYY ):

date            count     grp
01-09-2009       54        1
01-09-2009       100       2
01-09-2009       546       3
01-10-2009       67        4
01-11-2009       80        5
01-11-2009       45        6

I want to convert this data frame into time series using ts() , but the problem is: the current data frame has multiple values for the same date. 我想使用ts()将此数据帧转换为时间序列,但问题是:当前数据帧在同一日期具有多个值。 Can we apply time series in this case? 在这种情况下,我们可以应用时间序列吗?

  • Can I convert data frame into time series, and build a model (ARIMA) which can forecast count value on a daily basis? 我可以将数据帧转换为时间序列,并建立一个可以每天预测计数值的模型(ARIMA)吗?

  • OR should I forecast count value based on grp, but in that case, I have to select only grp and count column of a data frame. 或者我应该基于grp预测计数值,但是在那种情况下,我必须只选择grp并计数数据帧的列。 So in that case, I have to skip date column, and daily forecast for count value is not possible? 因此,在这种情况下,我必须跳过日期列,并且无法对计数值进行每日预测?

  • Suppose if I want to aggregate count value on per day basis. 假设我想每天汇总计数值。 I tried with aggregate function, but there we have to specify date value, but I have a very large data set? 我尝试使用汇总函数,但是必须指定日期值,但是我有非常大的数据集? Any other option available in r? r中还有其他可用选项吗?

Can somebody, please, suggest if there is a better approach to follow? 有人可以建议是否有更好的方法吗? My assumption is that the time series forcast works only for bivariate data? 我的假设是时间序列预测仅适用于双变量数据? Is this assumption right? 这个假设正确吗?

It seems like there are two aspects of your problem: 您的问题似乎有两个方面:

i want to convert this data frame into time series using ts() , but the problem is- current data frame having multiple values for the same date. 我想使用ts()将此数据帧转换为时间序列,但问题是-当前数据帧在同一日期具有多个值。 can we apply time series in this case? 在这种情况下,我们可以应用时间序列吗?

If you are happy making use of the xts package you could attempt: 如果您乐于使用xts包,则可以尝试:

dta2$date <- as.Date(dta2$date, "%d-%m-%Y")
dtaXTS <- xts::as.xts(dta2[,2:3], dta2$date)

which would result in: 这将导致:

>> head(dtaXTS)
           count grp
2009-09-01    54   1
2009-09-01   100   2
2009-09-01   546   3
2009-10-01    67   4
2009-11-01    80   5
2009-11-01    45   6

of the following classes: 以下类别中的一个:

>> class(dtaXTS)
[1] "xts" "zoo"

You could then use your time series object as univariate time series and refer to the selected variable or as a multivariate time series, example using PerformanceAnalytics packages: 然后,您可以将时间序列对象用作单变量时间序列,并引用选定的变量或多变量时间序列,例如使用PerformanceAnalytics包:

PerformanceAnalytics::chart.TimeSeries(dtaXTS)

多元时间序列

Side points 侧面点

Concerning your second question: 关于第二个问题:

can somebody plz suggest me what is the better approach to follow, my assumption is time series forcast is works only for bivariate data? 有人可以建议我遵循什么更好的方法,我的假设是时间序列预测仅适用于双变量数据? is this assumption also right? 这个假设对吗?

IMHO, this is rather broad. 恕我直言,这是相当广泛的。 I would suggest that you use created xts object and elaborate on the model you want to utilise and why, if it's a conceptual question about nature of time series analysis you may prefer to post your follow-up question on CrossValidated . 我建议您使用创建的xts对象并详细说明要使用的模型,以及为什么,如果这是有关时间序列分析性质的概念性问题,您可能更愿意将后续问题发布到CrossValidated上


Data sourced via: dta2 <- read.delim(pipe("pbpaste"), sep = "") using the provided example. 数据来自: dta2 <- read.delim(pipe("pbpaste"), sep = "")使用提供的示例。

Since daily forecasts are wanted we need to aggregate to daily. 由于需要每日预报,因此我们需要汇总到每天。 Using DF from the Note at the end, read the first two columns of data into a zoo series z using read.zoo and argument aggregate=sum . 使用末尾Note中的DF ,使用read.zoo和参数aggregate=sum将数据的前两列读入zoo系列z We could optionally convert that to a "ts" series ( tser <- as.ts(z) ) although this is unnecessary for many forecasting functions. 我们可以选择将其转换为"ts"系列( tser <- as.ts(z) ),尽管这对于许多预测功能而言都是不必要的。 In particular, checking out the source code of auto.arima we see that it runs x <- as.ts(x) on its input before further processing. 特别是,检查出auto.arima的源代码后,我们看到它在进一步处理之前在其输入上运行x <- as.ts(x) Finally run auto.arima , forecast or other forecasting function. 最后运行auto.arimaforecast或其他预测功能。

library(forecast)
library(zoo)

z <- read.zoo(DF[1:2], format = "%m-%d-%Y", aggregate = sum)

auto.arima(z)

forecast(z)

Note: DF is given reproducibly here: 注意: DF在此处可重现:

Lines <- "date            count     grp
01-09-2009       54        1
01-09-2009       100       2
01-09-2009       546       3
01-10-2009       67        4
01-11-2009       80        5
01-11-2009       45        6"
DF <- read.table(text = Lines, header = TRUE)

Updated: Revised after re-reading question. 更新:重新阅读问题后修订。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM