如何将 plot 与 Pandas 随时间变化的变量之间的相关性折线图？

Question

I have three variables (x, y and z) collected at different times (30, 60 and 120 days).我在不同时间（30、60 和 120 天）收集了三个变量（x、y 和 z）。 I have a correlation dataframe between the three variables, separated by the collected days.我在三个变量之间有一个相关性 dataframe，由收集的天数分隔。

I would like to plot a line graph, to try to understand the behavior of the correlation between the same variables over time.我想 plot 一个折线图，试图了解相同变量之间的相关性随时间变化的行为。

On the graph's X axis, the times 30, 60 and 120 days and on the Y axis, the correlation values for each pair of variables (without repeating the combination between them or the correlation with itself (1.00)), that is, only the correlations between x and y, x and z, and y and z.在图表的 X 轴上，时间为 30、60 和 120 天，在 Y 轴上，每对变量的相关值（不重复它们之间的组合或与自身的相关性（1.00）），即只有x 和 y、x 和 z 以及 y 和 z 之间的相关性。

Below I made a reproducible example of the three dataframes I have.下面我做了一个我拥有的三个数据框的可重现示例。

import pandas as pd

day30_dict = {
    "Index": [
        "x30, y30",
        "x30, z30",
        "x30, x30",
        "y30, x30",
        "y30, z30",
        "y30, y30",
        "z30, x30",
        "z30, y30",
        "z30, z30",
    ],
    "cor": [0.50, 0.11, 1.00, 0.50, 0.22, 1.00, 0.11, 0.22, 1.00],
}

day30_df = pd.DataFrame(day30_dict)
day30_df = day30_df.set_index("Index")

day60_dict = {
    "Index": [
        "x60, y60",
        "x60, z60",
        "x60, x60",
        "y60, x60",
        "y60, z60",
        "y60, y60",
        "z60, x60",
        "z60, y60",
        "z60, z60",
    ],
    "cor": [0.10, 0.15, 1.00, 0.10, 0.77, 1.00, 0.15, 0.77, 1.00],
}

day60_df = pd.DataFrame(day60_dict)
day60_df = day60_df.set_index("Index")

day120_dict = {
    "Index": [
        "x120, y120",
        "x120, z120",
        "x120, x120",
        "y120, x120",
        "y120, z120",
        "y120, y120",
        "z120, x120",
        "z120, y120",
        "z120, z120",
    ],
    "cor": [0.01, 0.03, 1.00, 0.01, 0.90, 1.00, 0.03, 0.90, 1.00],
}

day120_df = pd.DataFrame(day120_dict)
day120_df = day120_df.set_index("Index")```

Answer 1

Here is one way to do it with Pandas drop_duplicates , string methods , andconcat :这是使用 Pandas drop_duplicates 、 string methods和concat执行此操作的一种方法：

# Remove duplicates and self correlations in each dataframe,
# Add a new column for time
# Remove numeric values from Index column
# Store dataframes in a list
dfs = []
for df in [day30_df, day60_df, day120_df]:
    df = df[df["cor"] < 1].reset_index()
    df["temp"] = df["Index"].apply(sorted)
    df = df.drop_duplicates("temp").drop(columns="temp")
    df["time"] = df["Index"].str.extract(r"(\d+)")
    df["Index"] = df["Index"].str.replace(r"\d+", "", regex=True)
    dfs.append(df)

new_df = pd.concat(dfs).sort_values("Index", ignore_index=True)

Then, running this code in a Jupyter notebook cell:然后，在 Jupyter 笔记本单元中运行此代码：

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,6))
for i, df_ in new_df.groupby("Index"):
    df_.plot(x="time", y="cor", label=i, ax=ax)

Outputs:输出：

如何将 plot 与 Pandas 随时间变化的变量之间的相关性折线图？

问题描述

1 个解决方案

解决方案1
0 2023-01-08 07:38:15

如何将 plot 与 Pandas 随时间变化的变量之间的相关性折线图？

问题描述

1 个解决方案

解决方案1 0 2023-01-08 07:38:15

解决方案1
0 2023-01-08 07:38:15