[英]How to plot a line graph of the correlation between variables over time with Pandas?
I have three variables (x, y and z) collected at different times (30, 60 and 120 days).我在不同时间(30、60 和 120 天)收集了三个变量(x、y 和 z)。 I have a correlation dataframe between the three variables, separated by the collected days.
我在三个变量之间有一个相关性 dataframe,由收集的天数分隔。
I would like to plot a line graph, to try to understand the behavior of the correlation between the same variables over time.我想 plot 一个折线图,试图了解相同变量之间的相关性随时间变化的行为。
On the graph's X axis, the times 30, 60 and 120 days and on the Y axis, the correlation values for each pair of variables (without repeating the combination between them or the correlation with itself (1.00)), that is, only the correlations between x and y, x and z, and y and z.在图表的 X 轴上,时间为 30、60 和 120 天,在 Y 轴上,每对变量的相关值(不重复它们之间的组合或与自身的相关性(1.00)),即只有x 和 y、x 和 z 以及 y 和 z 之间的相关性。
Below I made a reproducible example of the three dataframes I have.下面我做了一个我拥有的三个数据框的可重现示例。
import pandas as pd
day30_dict = {
"Index": [
"x30, y30",
"x30, z30",
"x30, x30",
"y30, x30",
"y30, z30",
"y30, y30",
"z30, x30",
"z30, y30",
"z30, z30",
],
"cor": [0.50, 0.11, 1.00, 0.50, 0.22, 1.00, 0.11, 0.22, 1.00],
}
day30_df = pd.DataFrame(day30_dict)
day30_df = day30_df.set_index("Index")
day60_dict = {
"Index": [
"x60, y60",
"x60, z60",
"x60, x60",
"y60, x60",
"y60, z60",
"y60, y60",
"z60, x60",
"z60, y60",
"z60, z60",
],
"cor": [0.10, 0.15, 1.00, 0.10, 0.77, 1.00, 0.15, 0.77, 1.00],
}
day60_df = pd.DataFrame(day60_dict)
day60_df = day60_df.set_index("Index")
day120_dict = {
"Index": [
"x120, y120",
"x120, z120",
"x120, x120",
"y120, x120",
"y120, z120",
"y120, y120",
"z120, x120",
"z120, y120",
"z120, z120",
],
"cor": [0.01, 0.03, 1.00, 0.01, 0.90, 1.00, 0.03, 0.90, 1.00],
}
day120_df = pd.DataFrame(day120_dict)
day120_df = day120_df.set_index("Index")```
Here is one way to do it with Pandas drop_duplicates , string methods , andconcat :这是使用 Pandas drop_duplicates 、 string methods和concat执行此操作的一种方法:
# Remove duplicates and self correlations in each dataframe,
# Add a new column for time
# Remove numeric values from Index column
# Store dataframes in a list
dfs = []
for df in [day30_df, day60_df, day120_df]:
df = df[df["cor"] < 1].reset_index()
df["temp"] = df["Index"].apply(sorted)
df = df.drop_duplicates("temp").drop(columns="temp")
df["time"] = df["Index"].str.extract(r"(\d+)")
df["Index"] = df["Index"].str.replace(r"\d+", "", regex=True)
dfs.append(df)
new_df = pd.concat(dfs).sort_values("Index", ignore_index=True)
Then, running this code in a Jupyter notebook cell:然后,在 Jupyter 笔记本单元中运行此代码:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,6))
for i, df_ in new_df.groupby("Index"):
df_.plot(x="time", y="cor", label=i, ax=ax)
Outputs:输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.