繁体   English   中英

覆盖 pandas KDE 和直方图时正确渲染 y 轴

[英]Render y-axis properly when overlaying pandas KDE and histogram

之前已经提出过类似的问题,但没有将这两个精确的绘图函数一起使用,所以我们在这里:

我有一个来自 pandas DataFrame 的列,我正在绘制直方图和 KDE。 但是,当我 plot 它们时,y 轴使用原始数据值范围而不是离散的样本数/bin(我想要的)。 我怎样才能解决这个问题? 实际的 plot 是完美的,但是 y 轴是错误的。

数据:

t2 = [140547476703.0, 113395471484.0, 158360225172.0, 105497674121.0, 186457736557.0, 153705359063.0, 36826568371.0, 200653068740.0, 190761317478.0, 126529980843.0, 98776029557.0, 132773701862.0, 14780432449.0, 167507656251.0, 121353262386.0, 136377019007.0, 134190768743.0, 218619462126.0, 07912778721.0, 215628911255.0, 147024833865.0, 94136343562.0, 135685803096.0, 165901502129.0, 45476074790.0, 125195690010.0, 113910844263.0, 123134290987.0, 112028565305.0, 93448218430.0, 07341012378.0, 93146854494.0, 132958913610.0, 102326700019.0, 196826471714.0, 122045354980.0, 76591131961.0, 134694468251.0, 120212625727.0, 108456858852.0, 106363042112.0, 193367024628.0, 39578667378.0, 178075400604.0, 155513974664.0, 132834624567.0, 137336282646.0, 125379267464.0]

代码:

fig = plt.figure()
# plot hist + kde
t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
t2[t2.columns[0]].plot.hist(density = True, edgecolor = "grey", color = "tomato", title = t2.columns[0])

# plot mean/stdev
m = t2[t2.columns[0]].mean()
stdev = t2[t2.columns[0]].std()
plt.axvline(m, color = "black", ymax = 0.05, label = "mean")
plt.axvline(m-2*stdev, color = "black", ymax = 0.05, linestyle = ":", label = "+/- 2*Stdev")
plt.axvline(m+2*stdev, color = "black", ymax = 0.05, linestyle = ":")
plt.legend()

现在的样子: 在此处输入图像描述

如果您想要实际计数,则需要将 KDE 扩大为箱的宽度乘以观察次数。 最棘手的部分是访问数据 pandas 用于 plot KDE。 (我已经删除了与图例相关的部分以简化手头的问题)。

import matplotlib.pyplot as plt
import numpy as np

# Calculate KDE, get data
axis = t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
xdata = axis.get_children()[0]._x
ydata = axis.get_children()[0]._y
plt.clf()


# Real figure
fig, ax = plt.subplots(figsize=(7,5))
# Plot Histogram, no density.
x = ax.hist(t2[t2.columns[0]], edgecolor = "grey", color = "tomato")

# size of the bins * N obs
scale = np.diff(x[1])[0]*len(t2)

# Plot scaled KDE
ax.plot(xdata, ydata*scale, color='blue')
ax.set_ylabel('N observations')

plt.show()

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM