[英]How to Iterate over multiple DataFrames and plot histograms for each feature with each data set's feature in the same graph
I have two DataFrames: df1 & df2我有两个数据框:df1 & df2
df1
Age BsHgt_M BsWgt_Kg GOAT-MBOAT4_F_BM TCF7L2_M_BM UCP2_M_BM
23.0 1.84 113.0 -1.623634 0.321379 0.199183
23.0 1.68 113.9 -1.073523 -0.957523 0.549469
24.0 1.60 86.4 -0.270883 -0.004106 1.479865
20.0 1.59 99.2 -0.218071 0.568458 -0.398410
df2
Age BsHgt_M BsWgt_Kg GOAT-MBOAT4_F_BM TCF7L2_M_BM UCP2_M_BM
29.0 1.94 123.0 -1.623676 0.321379 0.199183
30.0 1.61 113.9 -1.073523 -0.957523 0.549469
44.0 1.30 56.4 -0.270883 -0.004106 1.479865
30.0 1.19 91.2 -0.218071 0.568458 -0.398410
Here I'm trying to iterate over each column and plot a histogram for each column for df1, this I can do with the below code:在这里,我尝试遍历每一列,plot 为 df1 的每一列创建一个直方图,我可以使用以下代码:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(len(df1.columns), figsize=(10,50))
for n, col in enumerate(df1.columns):
df1[col].hist(ax=axs[n],legend=True)
But, I have to iterate over two DataFrames and plot histograms in such a way that to see histograms of each feature with each data frame's feature in the same graph, or side-by-side histograms with the same scale is also fine但是,我必须迭代两个 DataFrames 和 plot 直方图,以便在同一个图中查看每个特征的直方图与每个数据框的特征,或者具有相同比例的并排直方图也可以
Desired plot所需 plot
histogram subplots:直方图子图:
df1['Age'] vs df2['Age']
df1['BsHgt_M'] vs df2['BsHgt_M']
.
.
.
Can anyone enlighten me on how to accomplish this谁能告诉我如何做到这一点
IIUC, you could assign a new column named ID
to both data frames that could be used for your legend to distinguish between your histograms. IIUC,您可以为两个数据框分配一个名为
ID
的新列,该列可用于您的图例以区分直方图。 Then, you can concatenate your data frames row-wise using pd.concat
.然后,您可以使用
pd.concat
逐行连接您的数据帧。 After that, you just need to define your axis and figure and iterate over all columns except of the new assigned one and plot a histogram using seaborn
while distinguishing between your assigned variable.之后,您只需要定义轴和图形并遍历所有列,除了新分配的列和 plot 使用
seaborn
的直方图,同时区分分配的变量。 The implementation of such a distinction is straight-forward in seaborn
, just use the argument hue
.这种区别的实现在
seaborn
中很简单,只需使用参数hue
。
Possible Code:可能的代码:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# Note: next time when asking something on SO, please provide data as code like this,
# it makes it easier for the community to replicate your problem and to help you
df1 = pd.DataFrame({
"Age": [23, 23, 24, 20],
"BsHgt_M": [1.84, 1.68, 1.6, 1.59],
"BsWgt_Kg": [113, 113.9, 86.4, 99.2],
"GOAT-MBOAT4_F_BM": [-1.623634, -1.073523, -0.270883, -0.218071],
"TCF7L2_M_BM": [0.321379, -0.957523, -0.004106, 0.568458],
"UCP2_M_BM": [0.199183, 0.549469, 1.479865, -0.398410]
})
df2 = pd.DataFrame({
"Age": [29, 30, 44, 30],
"BsHgt_M": [1.94, 1.61, 1.3, 1.19],
"BsWgt_Kg": [123, 113.9, 56.4, 91.2],
"GOAT-MBOAT4_F_BM": [-1.623676, -1.073523, -0.270883, -0.218071],
"TCF7L2_M_BM": [0.321379, -0.957523, -0.004106, 0.549469],
"UCP2_M_BM": [0.199183, 0.5499, 1.479865, -0.398410]
})
df1["ID"] = "df1"
df2["ID"] = "df2"
df = pd.concat([df1, df2]).reset_index(drop=True)
cols = df1.columns[:-1]
assert (cols == df2.columns[:-1]).all()
fig, ax = plt.subplots((len(cols)), figsize=(6, 14), sharex=False)
for i, col in enumerate(cols):
sns.histplot(data=df, x=col, hue="ID", ax=ax[i])
if i > 0: ax[i].legend(list(), frameon=False)
ax[i].set_ylabel(col)
sns.move_legend(ax[0], "upper left", bbox_to_anchor=(1, 1))
ax[-1].set_xlabel("")
plt.show()
This code plots histograms for all columns.此代码绘制所有列的直方图。
For two columns, it would look somewhat like this:对于两列,它看起来有点像这样:
If needed, the style and form can easily be adjusted.如果需要,可以轻松调整样式和形式。 This is just an example of a possible solution to your problem and should only serve as a guideline.
这只是您的问题的可能解决方案的一个示例,应仅作为指导。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.