简体   繁体   中英

How to calculate auto-covariance in Python

I want to calculate auto-covariance of 3 arrays X1, X2 and Y which are all stationary random process. Is there any function in sciPy or other library can solve this problem?

According to the standard estimation of the autocovariance coefficient for discrete signals, which can be expressed by equation:

在此输入图像描述

...where x(i) is a given signal (ie specific 1D vector), k stands for the shift of x(i) signal by k samples, N is the length of x(i) signal, and:

在此输入图像描述

...which is simple average, we can write:

'''
Calculate the autocovarriance coefficient.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

print("Autocovariance:", autocovariance(Xi, N, k, Xs))

If you would like to normalize the autocovariance coefficient, which will become the autocorrelation coefficient expressed as:

在此输入图像描述

...than you just have to add to the above code just two additional lines:

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

Here is full script:

'''
Calculate the autocovarriance and autocorrelation coefficients.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

print("Autocovariance:", autocovariance(Xi, N, k, Xs))
print("Autocorrelation:", autocorrelation())

Get sample auto covariance:

# cov_auto_samp(X,delta)/cov_auto_samp(X,0) = auto correlation
def cov_auto_samp(X,delta):
    N = len(X)
    Xs = np.average(X)
    autoCov = 0.0
    times = 0.0
    for i in np.arange(0, N-delta):
        autoCov += (X[i+delta]-Xs)*(X[i]-Xs)
        times +=1
    return autoCov/times

A small tweak to the previous answers, which avoids python for loops and uses numpy array operations instead. This will be quicker if you have a lot of data.

def lagged_auto_cov(Xi,t):
    """
    for series of values x_i, length N, compute empirical auto-cov with lag t
    defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
    """
    N = len(Xi)

    # use sample mean estimate from whole series
    Xs = np.mean(Xi)

    # construct copies of series shifted relative to each other, 
    # with mean subtracted from values
    end_padded_series = np.zeros(N+t)
    end_padded_series[:N] = Xi - Xs
    start_padded_series = np.zeros(N+t)
    start_padded_series[t:] = Xi - Xs

    auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
    return auto_cov

Comparing this against @bluevoxel 's code, using a time-series of 50,000 data points and computing the auto-correlation for a single fixed value of lag, the python for loop code averaged about 30 milli-seconds and using numpy arrays averaged faster than 0.3 milli-seconds (running on my laptop).

@user333700 has the right answer. Using a library (such as statsmodels ) is generally preferred over writing your own. However, it is insightful to implement your own at least once.

def _check_autocovariance_input(x):
        if len(x) < 2:
            raise ValueError('Need at least two elements to calculate autocovariance')

def get_autocovariance_given_lag(x, lag):
    _check_autocovariance_input(x)

    x_centered = x - np.mean(x)
    a = np.pad(x_centered, pad_width=(0, lag), mode='constant')
    b = np.pad(x_centered, pad_width=(lag, 0), mode='constant')
    return np.dot(a, b) / len(x)

def get_autocovariance(x):
    _check_autocovariance_input(x)
    x_centered = x - np.mean(x)
    return np.correlate(x_centered, x_centered, mode='full')[len(x) - 1:] / len(x)

The function I have get_autocovariance_given_lag calculates the autocovariance for a given lag.

If you are interested in all lags, the get_autocovariance can be used. The np.correlate function is what statsmodels uses under the hood. It calculates the cross correlation. This is a sliding dot product. For example, suppose the array is [1, 2, 3]. Then we get:

      [1, 2, 3]      = 3 * 1 = 3
[1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
   [1, 2, 3]


      [1, 2, 3]      = 1 * 1 + 2 * 2 + 3 * 3 = 14
      [1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
         [1, 2, 3]

      [1, 2, 3]      = 3 * 1 = 3
            [1, 2, 3]

But note we are interested in the covariance that starts at lag 0. Where is this? Well, this occurs after we have moved N - 1 positions to the right where N is the length of the array. This is why we return the array starting at N-1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM