简体   繁体   中英

In R is there any way to use acf and other time series functions with multiple entities but treat it as univariate

Basically what I would like to be able to have a time series, with multiple entities but have it treated in a certain sense as a single series with multiple resets. An example would be daily price changes for multiple stocks and being able to calculate time series statistics across all stocks as a group, not for each individual stock. I can create a matrix for the multiple stocks. But when I do this, calling functions such as acf treats each stock separately, producing a multi-panel plot comparing the autocorrelation in each series to all the others. I would like just a single plot showing the autocorrelation aggregated across all stocks. Is there any way to do this in R?

` require(lubridate)

n <- 10 # or 500
t <- 100 # or 10000
date <- sort(today() - 1:t)
symbol <- paste0("Company", 1:n)

set.seed(123)

start <- as.integer(runif(n)*(t/2)) #ipo
end <- ifelse(runif(n)<0.8, t, as.integer(start+runif(n)*(t-start+1)-1)) #delisting
lifetime <- end-start+1
volatility <- rchisq(n, 3)*0.003
autoCorrel <- rnorm(1)*0.05

raw <- data.frame()
for (i in 1:n) {
  change=rnorm(lifetime[i])*volatility[i]
  change[-1] <- change[-1] + autoCorrel*change[-length(change)]
  raw <- rbind(raw, data.frame(symbol=rep(symbol[i], lifetime[i]), 
                               date=date[start[i]:end[i]], change=change))
}

print(head(raw))

autocorrel <- function(x, entity, time, n=20) {
  x <- x[order(entity, time)]
  sapply(1:n, function(i) {
    len <- length(x)
    data.frame(lag=i, cor=cor(x[1:(len-i)][entity[1:(len-i)]==entity[-(1:i)]], 
                              x[-(1:i)][entity[1:(len-i)]==entity[-(1:i)]]))
  })
}

print(autocorrel(raw$change, raw$symbol, raw$date, 20))

`

In the above example I've written my own autocorrel function and used simulated stock market data, where companies have different date ranges. But the multiple entities could be a number of things. For example multiple weather stations taking simultaneous observations at the same time of the day but at different places. So that was why I was wondering if there is any class of functions or parameters to functions that can allow for this, or would it all need to be hand coded?

1) We can use by . The following plots the auitocorrelations for each company and also the mean at each lag (in bold). We enclose the autocorrel function in a try in which case it will give an error message for bad company data but still give a result for the others. (If you don't want the error message use the silent= argument to try or fix up the underlying problem.)

# same as in question except we use cbind rather than data.frame
autocorrel <- function(x, entity, time, n=20) {
  x <- x[order(entity, time)]
  sapply(1:n, function(i) {
    len <- length(x)
    cbind(lag=i, cor=cor(x[1:(len-i)][entity[1:(len-i)]==entity[-(1:i)]], 
                              x[-(1:i)][entity[1:(len-i)]==entity[-(1:i)]]))
  })
}

autocorrel_wrap <- function(DF) try(with(DF, autocorrel(change, symbol, date, 20)))
L <- by(raw, raw$symbol, autocorrel_wrap)
L <- L[sapply(L, is.matrix)] # rmeove bad data

acf.m <- sapply(L, "[", 2, TRUE) # extract correlations
lags <- seq(0, length = nrow(acf.m))

matplot(lags, acf.m, type = "o")

# plot mean correlation at each lag
lines(rowMeans(acf.m) ~ lags, lwd = 2)

在此处输入图片说明

2) We could model this using gls in the nlme package like this. (This is just an example. There are many models that could be considered in many packages and the question is pretty open ended.) This code fits a common AR1 model to the data except each series has a possibly different intercept:

library(nlme)
gls(change ~ symbol-1, raw, corAR1(form = ~ date | symbol)

giving:

Generalized least squares fit by REML
  Model: change ~ symbol - 1 
  Data: raw 
  Log-restricted-likelihood: 1950.228

Coefficients:
 symbolCompany1  symbolCompany2  symbolCompany3  symbolCompany4  symbolCompany5 
   0.0014916739   -0.0005797750   -0.0002394351    0.0006663767    0.0001541708 
 symbolCompany6  symbolCompany7  symbolCompany8  symbolCompany9 symbolCompany10 
  -0.0003972843    0.0003506882    0.0005941818   -0.0002036649    0.0003041018 

Correlation Structure: AR(1)
 Formula: ~date | symbol 
 Parameter estimate(s):
      Phi 
-0.050775 
Degrees of freedom: 618 total; 608 residual
Residual standard error: 0.009474571 

so the autocorrelations at lags 0, 1, 2, ... are:

> (-0.050775)^(0:10)
 [1]  1.000000e+00 -5.077500e-02  2.578101e-03 -1.309031e-04  6.646603e-06
 [6] -3.374813e-07  1.713561e-08 -8.700606e-10  4.417733e-11 -2.243104e-12
[11]  1.138936e-13

Update: Poster later added data so modified accordingly. Also added (2) and made some corrections.

I ended up using the autocorrel_wrap function that I wrote in OP. I think the answer to my question is that this functionality doesn't exist in R at the moment, and must be hand coded. But I'd like to thank Msr Grothendieck for an elegant work around to the problem.

A real autocorrelation function for panel data in R is collapse::psacf , it works by first standardizing data in each group, and then computing the autocovariance on the group-standardized panel-series using proper panel-lagging. Implementation is in C++ and very fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM