简体   繁体   English

对 python 中的时间序列数据进行关联 plot

[英]make correlation plot on time series data in python

I want to see a correlation on a rolling week basis in time series data.我想在时间序列数据中看到滚动周的相关性。 The reason because I want to see how rolling correlation moves each year.原因是我想看看滚动相关性每年如何变化。 To do so, I tried to use pandas.corr() , pandas.rolling_corr() built-in function for getting rolling correlation and tried to make line plot, but I couldn't correct the correlation line chart. To do so, I tried to use pandas.corr() , pandas.rolling_corr() built-in function for getting rolling correlation and tried to make line plot, but I couldn't correct the correlation line chart. I don't know how should I aggregate time series for getting rolling correlation line chart.我不知道我应该如何聚合时间序列以获得滚动相关折线图。 Does anyone knows any way of doing this in python?有谁知道在 python 中这样做的任何方式? Is there any workaround to get rolling correlation line chart from time series data in pandas?是否有任何解决方法可以从 pandas 中的时间序列数据中获取滚动相关折线图? any idea?任何想法?

my attempt :我的尝试

I tried of using pandas.corr() to get correlation but it was not helpful to generate rolling correlation line chart.我尝试使用pandas.corr()来获得相关性,但生成滚动相关性折线图没有帮助。 So, here is my new attempt but it is not working.所以,这是我的新尝试,但它不起作用。 I assume I should think about the right way of data aggregation to make rolling correlation line chart.我想我应该考虑正确的数据聚合方式来制作滚动相关折线图。

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv'
df = pd.read_csv(url)
df['date'] = pd.to_datetime(df['date'])

def get_corr(df, window=4):
    dfs = []
    for key, value in df:
        value["ROLL_CORR"] = pd.rolling_corr(value["prod_A_price"],value["prod_B_price"], window)
        dfs.append(value)
    df_final = pd.concat(dfs)
    return df_final

corr_df = get_corr(df, window=12)

fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=corr_df,alpha=.8)
plt.show()
plt.close()

doing this way is not working to me.这样做对我不起作用。 By doing this, I want to see how the rolling correlations move each year.通过这样做,我想看看滚动相关性每年如何变化。 Can anyone point me out possible of doing rolling correlation line chart from time-series data in python?谁能指出我可以从 python 中的时间序列数据中做滚动相关折线图吗? any thoughts?有什么想法吗?

desired output所需 output

here is the desired rolling correlation line chart that I want to get.这是我想要获得的所需滚动相关折线图 Note that desired plot was generated from MS excel.请注意,所需的 plot 是从 MS excel 生成的。 I am wondering is there any possible way of doing this in python?我想知道在 python 中是否有任何可能的方法? Is there any workaround to get a rolling correlation line chart from time-series data in python?是否有任何解决方法可以从 python 中的时间序列数据中获取滚动相关折线图? how should I correct my current attempt to get the desired output?我应该如何纠正我当前的尝试以获得所需的 output? any thoughts?有什么想法吗?

Using your code and description as a starting point.使用您的代码和描述作为起点。 Panda's Rolling class has an apply function which can be leveraged ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.apply.html#pandas.core.window.rolling.Rolling.apply ) Panda's Rolling class has an apply function which can be leveraged ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.apply.html#pandas.core. window.rolling.Rolling.apply )

Two tricks are involved to make the code work:使代码工作涉及两个技巧:

  1. Accessing the whole row in the applied function ( Pandas rolling apply using multiple columns )访问应用的 function 中的整行( Pandas 滚动应用使用多列
  2. We call the rolling function on a pandas.Series (here df['week'] ) to avoid going the applied function once per column我们在pandas.Series (此处为df['week'] )上调用rolling function 以避免每列应用一次 function
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv'
df = pd.read_csv(url)

def get_corr(ser):
    rolling_df = df.loc[ser.index]
    return rolling_df['prod_A_price'].corr(rolling_df['prod_B_price'])

df['ROLL_CORR'] = df['week'].rolling(4).apply(get_corr)

number_years = 3
for week, df_week in df.groupby('week'):
    df = df.append({
        'week': week,
        'year': f'{number_years} year avg',
        'ROLL_CORR': df_week.sort_values(by='date').head(number_years)['ROLL_CORR'].mean()
    }, ignore_index=True)

fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=df,alpha=.8)
plt.show()
plt.close()

You'll find here the generated image by seaborn您将在此处找到seaborn生成的图像

With the 3 year average以 3 年平均值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM