如何迭代多个 DataFrames 和 plot 直方图的每个特征，每个数据集的特征在同一个图中

Question

I have two DataFrames: df1 & df2我有两个数据框：df1 & df2

df1

Age BsHgt_M BsWgt_Kg    GOAT-MBOAT4_F_BM    TCF7L2_M_BM UCP2_M_BM
23.0    1.84    113.0   -1.623634   0.321379    0.199183
23.0    1.68    113.9   -1.073523   -0.957523   0.549469
24.0    1.60    86.4    -0.270883   -0.004106   1.479865
20.0    1.59    99.2    -0.218071   0.568458    -0.398410

df2
Age BsHgt_M BsWgt_Kg    GOAT-MBOAT4_F_BM    TCF7L2_M_BM UCP2_M_BM
29.0    1.94    123.0   -1.623676   0.321379    0.199183
30.0    1.61    113.9   -1.073523   -0.957523   0.549469
44.0    1.30    56.4    -0.270883   -0.004106   1.479865
30.0    1.19    91.2    -0.218071   0.568458    -0.398410

Here I'm trying to iterate over each column and plot a histogram for each column for df1, this I can do with the below code:在这里，我尝试遍历每一列，plot 为 df1 的每一列创建一个直方图，我可以使用以下代码：

import matplotlib.pyplot as plt

fig, axs = plt.subplots(len(df1.columns), figsize=(10,50))
for n, col in enumerate(df1.columns):
    df1[col].hist(ax=axs[n],legend=True)

But, I have to iterate over two DataFrames and plot histograms in such a way that to see histograms of each feature with each data frame's feature in the same graph, or side-by-side histograms with the same scale is also fine但是，我必须迭代两个 DataFrames 和 plot 直方图，以便在同一个图中查看每个特征的直方图与每个数据框的特征，或者具有相同比例的并排直方图也可以

Desired plot所需 plot

histogram subplots:直方图子图：

df1['Age'] vs df2['Age']
df1['BsHgt_M'] vs df2['BsHgt_M']
.
.
.

Can anyone enlighten me on how to accomplish this谁能告诉我如何做到这一点

Answer 1

IIUC, you could assign a new column named ID to both data frames that could be used for your legend to distinguish between your histograms. IIUC，您可以为两个数据框分配一个名为ID的新列，该列可用于您的图例以区分直方图。 Then, you can concatenate your data frames row-wise using pd.concat .然后，您可以使用pd.concat逐行连接您的数据帧。 After that, you just need to define your axis and figure and iterate over all columns except of the new assigned one and plot a histogram using seaborn while distinguishing between your assigned variable.之后，您只需要定义轴和图形并遍历所有列，除了新分配的列和 plot 使用seaborn的直方图，同时区分分配的变量。 The implementation of such a distinction is straight-forward in seaborn , just use the argument hue .这种区别的实现在seaborn中很简单，只需使用参数hue 。

Possible Code:可能的代码：

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Note: next time when asking something on SO, please provide data as code like this,
# it makes it easier for the community to replicate your problem and to help you
df1 = pd.DataFrame({
    "Age": [23, 23, 24, 20],
    "BsHgt_M": [1.84, 1.68, 1.6, 1.59],
    "BsWgt_Kg": [113, 113.9, 86.4, 99.2],   
    "GOAT-MBOAT4_F_BM": [-1.623634, -1.073523, -0.270883, -0.218071],
    "TCF7L2_M_BM": [0.321379, -0.957523, -0.004106, 0.568458],
    "UCP2_M_BM": [0.199183, 0.549469, 1.479865, -0.398410]
})

df2 = pd.DataFrame({
    "Age": [29, 30, 44, 30],
    "BsHgt_M": [1.94, 1.61, 1.3, 1.19],
    "BsWgt_Kg": [123, 113.9, 56.4, 91.2],   
    "GOAT-MBOAT4_F_BM": [-1.623676, -1.073523, -0.270883, -0.218071],
    "TCF7L2_M_BM": [0.321379, -0.957523, -0.004106, 0.549469],
    "UCP2_M_BM": [0.199183, 0.5499, 1.479865, -0.398410]
})

df1["ID"] = "df1"
df2["ID"] = "df2"

df = pd.concat([df1, df2]).reset_index(drop=True)
cols = df1.columns[:-1]

assert (cols == df2.columns[:-1]).all()

fig, ax = plt.subplots((len(cols)), figsize=(6, 14), sharex=False)
for i, col in enumerate(cols):
    sns.histplot(data=df, x=col, hue="ID", ax=ax[i])
    if i > 0: ax[i].legend(list(), frameon=False)
    ax[i].set_ylabel(col)
sns.move_legend(ax[0], "upper left", bbox_to_anchor=(1, 1))
ax[-1].set_xlabel("")
plt.show()

This code plots histograms for all columns.此代码绘制所有列的直方图。

For two columns, it would look somewhat like this:对于两列，它看起来有点像这样：

If needed, the style and form can easily be adjusted.如果需要，可以轻松调整样式和形式。 This is just an example of a possible solution to your problem and should only serve as a guideline.这只是您的问题的可能解决方案的一个示例，应仅作为指导。

如何迭代多个 DataFrames 和 plot 直方图的每个特征，每个数据集的特征在同一个图中

问题描述

1 个解决方案

解决方案1
0 2022-07-25 20:10:17

如何迭代多个 DataFrames 和 plot 直方图的每个特征，每个数据集的特征在同一个图中

问题描述

1 个解决方案

解决方案1 0 2022-07-25 20:10:17

解决方案1
0 2022-07-25 20:10:17