简体   繁体   English

如何将 seaborn.distplot() 中的 yticks 从标准化值更改为绝对值?

[英]How to change yticks in the seaborn.distplot() from normalised values to absolute values?

I am trying to create a gaussian curve (without the bar charts) using the seaborn.displot() method.我正在尝试使用seaborn.displot()方法创建高斯曲线(没有条形图)。 Unfortunately, I get normalised values on the y-axis instead of the absolute values.不幸的是,我在 y 轴上得到归一化值而不是绝对值。 How can I resolve this issue?我该如何解决这个问题?

Here's my code:这是我的代码:

height_mu = 165
height_sigma = 15
heights = np.random.normal(height_mu, height_sigma, size=10000)

plt.figure(figsize=(20, 5))
sns.distplot(heights, hist=False)
plt.axvline(165, color='red', label='Mean height (in cm)', linewidth=2)
plt.ylabel("Number of observations")
plt.legend()
plt.grid(which='major', axis='y', color='lightgrey')
plt.show()

There's no option inside seaborn to revert to counts, because once kde is turned on, the norm_hist option is False . seaborn 内部没有选项可以恢复计数,因为一旦打开 kde, norm_hist选项就是False Strictly speaking, when a gaussian kernel is applied, you get the density whose values depends on the binwidth and it can be >1 .严格来说,当应用高斯 kernel 时,您将获得其值取决于 binwidth 的密度,并且它可以是 >1

To get something similar to counts, you need to first define the bin width (sns.displot does it for you) and use gaussian_kde to perform the density.要获得类似于计数的东西,您需要首先定义 bin 宽度(sns.displot 为您完成)并使用gaussian_kde执行密度。 The values are density and you convert by multiplying the density values by binwidth and number of observations, eg counts_i = n * dens_i * binwidth这些值是密度,您可以通过将密度值乘以 binwidth 和观察次数来进行转换,例如counts_i = n * dens_i * binwidth

As noted by @mwaskom(see comments), may not be the best to show just the kde plot with y-axis as counts.正如@mwaskom(见评论)所指出的那样,仅显示以y轴为计数的kde plot 可能不是最好的。

We can check this:我们可以检查一下:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(999)
height_mu = 165
height_sigma = 15
heights = np.random.normal(height_mu, height_sigma, size=10000)
nbins = 50

fig,ax = plt.subplots(1,3,figsize=(10, 4))
sns.distplot(heights, hist=True,norm_hist=False,kde=False,bins=nbins,ax=ax[0])
sns.distplot(heights, hist=False,bins=nbins,ax=ax[1])
ax[1].axvline(165, color='red', label='Mean height (in cm)', linewidth=2)

from scipy.stats import gaussian_kde
dens = gaussian_kde(heights)
xlen,step = np.linspace(heights.min(),heights.max(),num=nbins,retstep=True)
ax[2].plot(xlen,len(heights)*dens(xlen)*step)
ax[2].axvline(165, color='red', label='Mean height (in cm)', linewidth=2)

fig.tight_layout()

在此处输入图像描述

The first plot on the left, you have the histogram with counts, 2nd plot the density plot you have, and on the right, the density with the "counts".左边的第一个 plot 是带有计数的直方图,第二个 plot 是密度 plot 您拥有的“计数”密度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM