简体   繁体   English

Python seaborn.distplot 返回计数而不是概率

[英]Python seaborn.distplot returning count instead of probability

I have a pandas series x :我有一个pandas系列x

0      -0.000069
1      -0.000059
2      -0.000025
3      -0.000021
4      -0.000021
          ...   
1036    0.000032
1037    0.000033
1038    0.000052
1039    0.000055
1040    0.000092
Name: c, Length: 1041, dtype: float64

I would like to plot a probability density function with histogram, in which I used seaborn.distplot :我想用直方图绘制概率密度函数,其中我使用了seaborn.distplot

import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot(x, hist=True, kde=True, bins=100,
             hist_kws={'edgecolor':'black', 'color': 'r'},
             kde_kws={'linewidth': 1, 'color': 'b'})

plt.xlim(-0.00002, 0.00002)
plt.ylim(ymin=0)
plt.xlabel("x")
plt.ylabel("probability")
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))

plt.show()

As a result, I get the following figure:结果,我得到了下图:

在此处输入图片说明

As shown, the vertical axis represents count, but instead I want (and expected from this code) probability.如图所示,纵轴代表计数,但我想要(并且从这段代码中得到预期)概率。 I am quite confused, as the identical code works properly for another pandas series.我很困惑,因为相同的代码适用于另一个pandas系列。 For example, with the identical code with different series (and different labels, etc.), I was able to produce the following correct figure:例如,使用具有不同系列(和不同标签等)的相同代码,我能够生成以下正确图:

在此处输入图片说明

Any idea why this code isn't working for my first series, and/or possible solutions?知道为什么此代码不适用于我的第一个系列和/或可能的解决方案吗?

The "problem", so to speak, is the fact that you labeled your y-axis "probability" when it is not a probability.可以这么说,“问题”是您在 y 轴不是概率时将其标记为“概率”。 The probability is the area under the curve (which is equal to 1).概率是曲线下的面积(等于 1)。

In your first plot, you have very large density, but very small x-values, so the product of the two remain coherent with a probability.在您的第一个图中,您的密度非常大,但 x 值非常小,因此两者的乘积与概率保持一致。 See probability density function for more info.有关更多信息,请参阅概率密度函数

I would edit out your plt.ylabel("probability") and label it to something else (the correct indicator, that is) or not label it at all.我会编辑您的plt.ylabel("probability")并将其标记为其他内容(即正确的指标)或根本不标记它。

I recommend using plt.ylabel("probability density") .我建议使用plt.ylabel("probability density")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM