简体   繁体   中英

Autocorrelation function of binary time series

I have binary (1 or 0) time series of an event and I want to calculate its ACF. The problem is that I need to split the TS into clusters according to their duration and to calculate ACF of each subset. Let me show you an example:

TS : (1,1,1,0,0,1,1,0,0,0,1)

I'd like to have an ACF that is a sum of :

ACF of cluster 1 : (1,1,1,0,0,0,0,0,0,0,0)

ACF of cluster 2 : (1,1,0,0,0,0,0,0,0,0,0)

ACF of cluster 3 : (1,0,0,0,0,0,0,0,0,0,0)

and then average these 3 vectors to get the result I need. The number of clusters is arbitrary, approximate duration of time series varies between 1k to 10k observations

It's not clear to me at all what you're trying to do.

  1. In agreement with @OttoKässi I don't understand the logic behind the subsets. Why three? Why those three? What is the (mathematical) rationale for constructing those subsets.

  2. More fundamentally, averaging correlation coefficients makes little sense to me. In autocorrelation, you calculate Pearson's product-moment correlation coefficients of the vector with different lagged versions of that same vector. Then you want to do that for three different (orthogonal) vectors, and average the coefficients? Why? That makes no statistical sense to me.

That aside, to calculate the autocorrelation for the three vectors you can do the following:

# Your sample vectors
v <- list(
    v1 = c(1,1,1,0,0,0,0,0,0,0,0),
    v2 = c(1,1,0,0,0,0,0,0,0,0,0),
    v3 = c(1,0,0,0,0,0,0,0,0,0,0));

# Calculate acf for lag = 0 ... 10 and store as columns in dataframe
# The rows correspond to lag = 0 ... 10
acf <- as.data.frame(lapply(v, function(x) as.numeric(acf(x, plot = FALSE)$acf)));
acf;
#            v1          v2           v3
#1   1.00000000  1.00000000  1.000000000
#2   0.63257576  0.47979798 -0.009090909
#3   0.26515152 -0.04040404 -0.018181818
#4  -0.10227273 -0.06060606 -0.027272727
#5  -0.13636364 -0.08080808 -0.036363636
#6  -0.17045455 -0.10101010 -0.045454545
#7  -0.20454545 -0.12121212 -0.054545455
#8  -0.23863636 -0.14141414 -0.063636364
#9  -0.27272727 -0.16161616 -0.072727273
#10 -0.18181818 -0.18181818 -0.081818182
#11 -0.09090909 -0.09090909 -0.090909091

If you now insist, you could calculate average correlation coefficients for different lags by taking the row averages. Mind you, I don't see how this makes statistical sense though.

rowMeans(acf);
#[1]  1.00000000  0.36776094  0.06885522 -0.06338384 -0.08451178 -0.10563973
#[7] -0.12676768 -0.14789562 -0.16902357 -0.14848485 -0.09090909

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM