简体   繁体   中英

How to make a correlation plot with a certain lag of two time series

I am trying to plot the autocorrelation between two Time Series in search for a needed lag. Python statsmodels.graphics.tsaplots library offers a plot_acf for investigation of the lagged impact of Time Series on itself.

How could I plot this lagged correlation to explore one Time Series impacting another Time Series to understand which lag I should choose?

To clarify, since you are attempting to investigate the correlations between two different time series, you are attempting to calculate the cross-correlation .

There is no such thing as "autocorrelation between two time series" - autocorrelation means the correlations within one time series across separate lags.

Let's take an example. Suppose one wishes to examine the cross-correlation between sunlight hours and maximum temperature in a location. This process is subject to seasonal lag - whereby maximum temperature will lag the period of maximum sunlight hours.

The cross-correlation is plotted for the data as follows:

# Import Libraries
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import acf, pacf
import matplotlib as mpl
import matplotlib.pyplot as plt
import quandl
import scipy.stats as ss

import os;
path="directory"
os.chdir(path)
os.getcwd()

#Variables
dataset=np.loadtxt("weather.csv", delimiter=",")
x=dataset[:,0]
y=dataset[:,1]
plt.xcorr(x, y, normed=True, usevlines=True, maxlags=365)
plt.title("Sunlight Hours versus Maximum Temperature")
plt.show()

Calculating the cross-correlations across a maximum of 365 lags, here is a plot of the data:

阳光

In this instance, the strongest correlation between maximum sunlight hours and maximum air temperature comes lags by approximately 40 days, ie this is when the strongest correlation between the two time series is observed.

In your case, I would recommend plotting cross-correlation between the two time series to determine if a lag is present, and if so by how many time periods.

https://stackoverflow.com/users/7094244/michael-grogan thank you for the explanation of "autocorrelation" and "crosscorrelation". I would rather suggest converting your plot image in more "statistical". For example like this one I made:

plt.xcorr(TS1, TS2, usevlines=True, maxlags=20, normed=True, lw=2)
plt.grid(True)
plt.axhline(0.2, color='blue', linestyle='dashed', lw=2)
plt.ylim([0, 0.3])
plt.title("Cross-correlation")

Cross-correlation plot image

As you could find from the plot, I have a very special case with almost no correlation. Ideally, you should rewrite

plt.set_ylim([0, 0.3])

as

plt.set_ylim([0, 1]) 

to see a all correlation bounds. And, normaly, correlation of >=0.2 is considered to be statistically significant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM