覆盖 pandas KDE 和直方图时正确渲染 y 轴

Question

Similar questions to this have been asked before but not using these exact two plotting functions together so here we are:之前已经提出过类似的问题，但没有将这两个精确的绘图函数一起使用，所以我们在这里：

I have a column from a pandas DataFrame that I am plotting both a histogram and the KDE.我有一个来自 pandas DataFrame 的列，我正在绘制直方图和 KDE。 However, when I plot them, the y-axis is using the raw data value range instead of discrete number of samples/bin (what I want).但是，当我 plot 它们时，y 轴使用原始数据值范围而不是离散的样本数/bin（我想要的）。 How can I fix this?我怎样才能解决这个问题？ The actual plot is perfect, but the y-axis is wrong.实际的 plot 是完美的，但是 y 轴是错误的。

Data:数据：

t2 = [140547476703.0, 113395471484.0, 158360225172.0, 105497674121.0, 186457736557.0, 153705359063.0, 36826568371.0, 200653068740.0, 190761317478.0, 126529980843.0, 98776029557.0, 132773701862.0, 14780432449.0, 167507656251.0, 121353262386.0, 136377019007.0, 134190768743.0, 218619462126.0, 07912778721.0, 215628911255.0, 147024833865.0, 94136343562.0, 135685803096.0, 165901502129.0, 45476074790.0, 125195690010.0, 113910844263.0, 123134290987.0, 112028565305.0, 93448218430.0, 07341012378.0, 93146854494.0, 132958913610.0, 102326700019.0, 196826471714.0, 122045354980.0, 76591131961.0, 134694468251.0, 120212625727.0, 108456858852.0, 106363042112.0, 193367024628.0, 39578667378.0, 178075400604.0, 155513974664.0, 132834624567.0, 137336282646.0, 125379267464.0]

Code:代码：

fig = plt.figure()
# plot hist + kde
t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
t2[t2.columns[0]].plot.hist(density = True, edgecolor = "grey", color = "tomato", title = t2.columns[0])

# plot mean/stdev
m = t2[t2.columns[0]].mean()
stdev = t2[t2.columns[0]].std()
plt.axvline(m, color = "black", ymax = 0.05, label = "mean")
plt.axvline(m-2*stdev, color = "black", ymax = 0.05, linestyle = ":", label = "+/- 2*Stdev")
plt.axvline(m+2*stdev, color = "black", ymax = 0.05, linestyle = ":")
plt.legend()

What it looks like now:现在的样子：

Answer 1

If you want the real counts, the you'll need to scale the KDE up by the width of the bins multiplied by the number of observations.如果您想要实际计数，则需要将 KDE 扩大为箱的宽度乘以观察次数。 The trickiest part is accessing the data pandas uses to plot the KDE.最棘手的部分是访问数据 pandas 用于 plot KDE。 (I've removed parts related to the legend to simplify the problem at hand). （我已经删除了与图例相关的部分以简化手头的问题）。

import matplotlib.pyplot as plt
import numpy as np

# Calculate KDE, get data
axis = t2[t2.columns[0]].plot.kde(color = "maroon", label = "_nolegend_")
xdata = axis.get_children()[0]._x
ydata = axis.get_children()[0]._y
plt.clf()


# Real figure
fig, ax = plt.subplots(figsize=(7,5))
# Plot Histogram, no density.
x = ax.hist(t2[t2.columns[0]], edgecolor = "grey", color = "tomato")

# size of the bins * N obs
scale = np.diff(x[1])[0]*len(t2)

# Plot scaled KDE
ax.plot(xdata, ydata*scale, color='blue')
ax.set_ylabel('N observations')

plt.show()

覆盖 pandas KDE 和直方图时正确渲染 y 轴

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-27 18:12:56

覆盖 pandas KDE 和直方图时正确渲染 y 轴

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-27 18:12:56

解决方案1
2 已采纳 2021-04-27 18:12:56