How to calculate auto-covariance in Python

Question

I want to calculate auto-covariance of 3 arrays X1, X2 and Y which are all stationary random process. Is there any function in sciPy or other library can solve this problem?

Answer 1

Statsmodels has auto- and cross covariance functions

http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acovf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.ccovf.html

plus the correlation functions and partial autocorrelation http://statsmodels.sourceforge.net/devel/tsa.html#descriptive-statistics-and-tests

Answer 2

According to the standard estimation of the autocovariance coefficient for discrete signals, which can be expressed by equation:

在此输入图像描述

...where x(i) is a given signal (ie specific 1D vector), k stands for the shift of x(i) signal by k samples, N is the length of x(i) signal, and:

在此输入图像描述

...which is simple average, we can write:

'''
Calculate the autocovarriance coefficient.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

print("Autocovariance:", autocovariance(Xi, N, k, Xs))

If you would like to normalize the autocovariance coefficient, which will become the autocorrelation coefficient expressed as:

在此输入图像描述

...than you just have to add to the above code just two additional lines:

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

Here is full script:

'''
Calculate the autocovarriance and autocorrelation coefficients.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

print("Autocovariance:", autocovariance(Xi, N, k, Xs))
print("Autocorrelation:", autocorrelation())

Answer 3

Get sample auto covariance:

# cov_auto_samp(X,delta)/cov_auto_samp(X,0) = auto correlation
def cov_auto_samp(X,delta):
    N = len(X)
    Xs = np.average(X)
    autoCov = 0.0
    times = 0.0
    for i in np.arange(0, N-delta):
        autoCov += (X[i+delta]-Xs)*(X[i]-Xs)
        times +=1
    return autoCov/times

Answer 4

A small tweak to the previous answers, which avoids python for loops and uses numpy array operations instead. This will be quicker if you have a lot of data.

def lagged_auto_cov(Xi,t):
    """
    for series of values x_i, length N, compute empirical auto-cov with lag t
    defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
    """
    N = len(Xi)

    # use sample mean estimate from whole series
    Xs = np.mean(Xi)

    # construct copies of series shifted relative to each other, 
    # with mean subtracted from values
    end_padded_series = np.zeros(N+t)
    end_padded_series[:N] = Xi - Xs
    start_padded_series = np.zeros(N+t)
    start_padded_series[t:] = Xi - Xs

    auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
    return auto_cov

Comparing this against @bluevoxel 's code, using a time-series of 50,000 data points and computing the auto-correlation for a single fixed value of lag, the python for loop code averaged about 30 milli-seconds and using numpy arrays averaged faster than 0.3 milli-seconds (running on my laptop).

Answer 5

@user333700 has the right answer. Using a library (such as statsmodels ) is generally preferred over writing your own. However, it is insightful to implement your own at least once.

def _check_autocovariance_input(x):
        if len(x) < 2:
            raise ValueError('Need at least two elements to calculate autocovariance')

def get_autocovariance_given_lag(x, lag):
    _check_autocovariance_input(x)

    x_centered = x - np.mean(x)
    a = np.pad(x_centered, pad_width=(0, lag), mode='constant')
    b = np.pad(x_centered, pad_width=(lag, 0), mode='constant')
    return np.dot(a, b) / len(x)

def get_autocovariance(x):
    _check_autocovariance_input(x)
    x_centered = x - np.mean(x)
    return np.correlate(x_centered, x_centered, mode='full')[len(x) - 1:] / len(x)

The function I have get_autocovariance_given_lag calculates the autocovariance for a given lag.

If you are interested in all lags, the get_autocovariance can be used. The np.correlate function is what statsmodels uses under the hood. It calculates the cross correlation. This is a sliding dot product. For example, suppose the array is [1, 2, 3]. Then we get:

      [1, 2, 3]      = 3 * 1 = 3
[1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
   [1, 2, 3]


      [1, 2, 3]      = 1 * 1 + 2 * 2 + 3 * 3 = 14
      [1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
         [1, 2, 3]

      [1, 2, 3]      = 3 * 1 = 3
            [1, 2, 3]

But note we are interested in the covariance that starts at lag 0. Where is this? Well, this occurs after we have moved N - 1 positions to the right where N is the length of the array. This is why we return the array starting at N-1 .

How to calculate auto-covariance in Python

Question

5 answers

solution1
6 2013-11-21 13:25:54

solution2
5 2014-12-12 22:22:36

solution3
2 2016-01-27 00:11:02

solution4
0 2017-11-14 14:58:00

solution5
0 2018-03-23 16:50:45

How to calculate auto-covariance in Python

Question

5 answers

solution1 6 2013-11-21 13:25:54

solution2 5 2014-12-12 22:22:36

solution3 2 2016-01-27 00:11:02

solution4 0 2017-11-14 14:58:00

solution5 0 2018-03-23 16:50:45

solution1
6 2013-11-21 13:25:54

solution2
5 2014-12-12 22:22:36

solution3
2 2016-01-27 00:11:02

solution4
0 2017-11-14 14:58:00

solution5
0 2018-03-23 16:50:45