[英]Pandas histogram plot with kde?
I have a Pandas dataframe ( Dt
) like this: 我有一个像这样的Pandas数据帧(
Dt
):
Pc Cvt C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
0 1 2 0.08 0.17 0.16 0.31 0.62 0.66 0.63 0.52 0.38
1 2 2 0.09 0.15 0.13 0.49 0.71 1.28 0.42 1.04 0.43
2 3 2 0.13 0.24 0.22 0.17 0.66 0.17 0.28 0.11 0.30
3 4 1 0.21 0.10 0.23 0.08 0.53 0.14 0.59 0.06 0.53
4 5 1 0.16 0.21 0.18 0.13 0.44 0.08 0.29 0.12 0.52
5 6 1 0.14 0.14 0.13 0.20 0.29 0.35 0.40 0.29 0.53
6 7 1 0.21 0.16 0.19 0.21 0.28 0.23 0.40 0.19 0.52
7 8 1 0.31 0.16 0.34 0.19 0.60 0.32 0.56 0.30 0.55
8 9 1 0.20 0.19 0.26 0.19 0.63 0.30 0.68 0.22 0.58
9 10 2 0.12 0.18 0.13 0.22 0.59 0.40 0.50 0.24 0.36
10 11 2 0.10 0.10 0.19 0.17 0.89 0.36 0.65 0.23 0.37
11 12 2 0.19 0.20 0.17 0.17 0.38 0.14 0.48 0.08 0.36
12 13 1 0.16 0.17 0.15 0.13 0.35 0.12 0.50 0.09 0.52
13 14 2 0.19 0.19 0.29 0.16 0.62 0.19 0.43 0.14 0.35
14 15 2 0.01 0.16 0.17 0.20 0.89 0.38 0.63 0.27 0.46
15 16 2 0.09 0.19 0.33 0.15 1.11 0.16 0.87 0.16 0.29
16 17 2 0.07 0.18 0.19 0.15 0.61 0.19 0.37 0.15 0.36
17 18 2 0.14 0.23 0.23 0.20 0.67 0.38 0.45 0.27 0.33
18 19 1 0.27 0.15 0.20 0.10 0.40 0.05 0.53 0.02 0.52
19 20 1 0.12 0.13 0.18 0.22 0.60 0.49 0.66 0.39 0.66
20 21 2 0.15 0.20 0.18 0.32 0.74 0.58 0.51 0.45 0.37
.
.
.
From this i want to plot an histogram
with kde
for each column from C1
to C10
in an arrange just like the one that i obtain if i plot it with pandas, 从这里我想绘制一个
histogram
其中kde
为每列从C1
到C10
的排列就像我得到的那个,如果我用熊猫绘制它,
Dt.iloc[:,2:].hist()
But so far i've been not able to add the kde
in each histogram; 但到目前为止,我还没有能够在每个直方图中添加
kde
; i want something like this: 我想要这样的东西:
Any ideas on how to accomplish this? 有关如何实现这一目标的任何想法?
You want to first plot your histogram then plot the kde on a secondary axis. 您想先绘制直方图,然后在辅助轴上绘制kde。
Minimal and Complete Verifiable Example MCVE 最小和完整的可验证示例MCVE
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(1000, 4)).add_prefix('C')
k = len(df.columns)
n = 2
m = (k - 1) // n + 1
fig, axes = plt.subplots(m, n, figsize=(n * 5, m * 3))
for i, (name, col) in enumerate(df.iteritems()):
r, c = i // n, i % n
ax = axes[r, c]
col.hist(ax=ax)
ax2 = col.plot.kde(ax=ax, secondary_y=True, title=name)
ax2.set_ylim(0)
fig.tight_layout()
Keep track of total number of subplots 跟踪子图的总数
k = len(df.columns)
n
will be the number of chart columns. n
将是图表列的数量。 Change this to suit individual needs. 改变它以满足个人需求。
m
will be the calculated number of required rows based on k
and n
m
将是基于k
和n
计算的所需行n
n = 2 m = (k - 1) // n + 1
Create a figure
and array of axes
with required number of rows and columns. 创建具有所需行数和列数的
axes
figure
和数组。
fig, axes = plt.subplots(m, n, figsize=(n * 5, m * 3))
Iterate through columns, tracking the column name
and which number we are at i
. 通过迭代列,追踪列
name
,我们是在和哪个号码i
。 Within each iteration, plot. 在每次迭代中,绘图。
for i, (name, col) in enumerate(df.iteritems()): r, c = i // n, i % n ax = axes[r, c] col.hist(ax=ax) ax2 = col.plot.kde(ax=ax, secondary_y=True, title=name) ax2.set_ylim(0)
Use tight_layout()
as an easy way to sharpen up the layout spacing 使用
tight_layout()
作为锐化布局间距的简单方法
fig.tight_layout()
Here is a pure seaborn solution, using FacetGrid.map_dataframe
as explained here . 这是一个纯粹的seaborn解决方案,使用
FacetGrid.map_dataframe
作为解释在这里 。
Stealing the example from @piRSquared: 从@piRSquared窃取示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000, 4)).add_prefix('C')
Get the data in the required format: 以所需格式获取数据:
df = df.stack().reset_index(level=1, name="val")
Result: 结果:
level_1 val
0 C0 0.879714
0 C1 -0.927096
0 C2 -0.929429
0 C3 -0.571176
1 C0 -1.127939
Then: 然后:
import seaborn as sns
def distplot(x, **kwargs):
ax = plt.gca()
data = kwargs.pop("data")
sns.distplot(data[x], ax=ax, **kwargs)
g = sns.FacetGrid(df, col="level_1", col_wrap=2, size=3.5)
g = g.map_dataframe(distplot, "val")
You can adjust col_wrap
as needed. 您可以根据需要调整
col_wrap
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.