简体   繁体   English

如何制作具有两个时间序列的一定滞后的相关图

[英]How to make a correlation plot with a certain lag of two time series

I am trying to plot the autocorrelation between two Time Series in search for a needed lag. 我试图绘制两个时间序列之间的自相关来寻找所需的滞后。 Python statsmodels.graphics.tsaplots library offers a plot_acf for investigation of the lagged impact of Time Series on itself. Python statsmodels.graphics.tsaplots库提供了plot_acf,用于调查时间序列对其自身的滞后影响。

How could I plot this lagged correlation to explore one Time Series impacting another Time Series to understand which lag I should choose? 我如何绘制这个滞后相关性来探索影响另一个时间序列的一个时间序列,以了解我应该选择哪个滞后?

To clarify, since you are attempting to investigate the correlations between two different time series, you are attempting to calculate the cross-correlation . 为了澄清,由于您试图调查两个不同时间序列之间的相关性,因此您尝试计算互相关

There is no such thing as "autocorrelation between two time series" - autocorrelation means the correlations within one time series across separate lags. 没有“两个时间序列之间的自相关”这样的东西 - 自相关意味着在一个时间序列内跨越不同时滞的相关性。

Let's take an example. 我们来举个例子吧。 Suppose one wishes to examine the cross-correlation between sunlight hours and maximum temperature in a location. 假设有人希望检查一个地方的日照时数和最高温度之间的互相关性。 This process is subject to seasonal lag - whereby maximum temperature will lag the period of maximum sunlight hours. 这个过程受到季节性滞后的影响 - 最高温度将落后于最大日照时数。

The cross-correlation is plotted for the data as follows: 对数据绘制了互相关,如下所示:

# Import Libraries
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import acf, pacf
import matplotlib as mpl
import matplotlib.pyplot as plt
import quandl
import scipy.stats as ss

import os;
path="directory"
os.chdir(path)
os.getcwd()

#Variables
dataset=np.loadtxt("weather.csv", delimiter=",")
x=dataset[:,0]
y=dataset[:,1]
plt.xcorr(x, y, normed=True, usevlines=True, maxlags=365)
plt.title("Sunlight Hours versus Maximum Temperature")
plt.show()

Calculating the cross-correlations across a maximum of 365 lags, here is a plot of the data: 计算最大365滞后的互相关,这是一个数据图:

阳光

In this instance, the strongest correlation between maximum sunlight hours and maximum air temperature comes lags by approximately 40 days, ie this is when the strongest correlation between the two time series is observed. 在这种情况下,最大日照时数和最高空气温度之间的最强相关性滞后约40天,即这是在观察到两个时间序列之间的最强相关性时。

In your case, I would recommend plotting cross-correlation between the two time series to determine if a lag is present, and if so by how many time periods. 在您的情况下,我建议绘制两个时间序列之间的互相关,以确定是否存在滞后,如果存在滞后,则确定是否存在多少个时间段。

https://stackoverflow.com/users/7094244/michael-grogan thank you for the explanation of "autocorrelation" and "crosscorrelation". https://stackoverflow.com/users/7094244/michael-grogan感谢您对“自相关”和“交叉相关”的解释。 I would rather suggest converting your plot image in more "statistical". 我宁愿建议将你的情节图像转换为更“统计”。 For example like this one I made: 比如像我做的那样:

plt.xcorr(TS1, TS2, usevlines=True, maxlags=20, normed=True, lw=2)
plt.grid(True)
plt.axhline(0.2, color='blue', linestyle='dashed', lw=2)
plt.ylim([0, 0.3])
plt.title("Cross-correlation")

Cross-correlation plot image 互相关图图像

As you could find from the plot, I have a very special case with almost no correlation. 你可以从情节中找到,我有一个非常特殊的情况,几乎没有相关性。 Ideally, you should rewrite 理想情况下,你应该重写

plt.set_ylim([0, 0.3])

as

plt.set_ylim([0, 1]) 

to see a all correlation bounds. 查看所有相关界限。 And, normaly, correlation of >=0.2 is considered to be statistically significant. 并且,正常地,> = 0.2的相关性被认为是统计学上显着的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM