简体   繁体   English

如何使用 python matplotlib 绘制正态分布的直方图?

[英]How do I draw a histogram for a normal distribution using python matplotlib?

My question is - Use the NumPy functions np.random.randn to generate data x for a normal distribution for 100,000 points.我的问题是 - 使用 NumPy 函数 np.random.randn 为 100,000 点的正态分布生成数据 x。 Then plot a histogram.然后 plot 一个直方图。

My computation is -我的计算是 -

x = sp.norm.pdf(np.random.randn(100000))
plt.hist(x, bins = 20, facecolor='blue', alpha=0.5)

Is there something wrong as I can't get the histogram of a normal distribution?有什么问题,因为我无法获得正态分布的直方图吗?

在此处输入图像描述

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(100_000)
plt.hist(x, bins=20, facecolor="blue", alpha=0.5)

plt.show()

To obtain N random samples from a standard normal distribution, you can either use np.random.randn(N) or scipy's stats.norm.rvs(size=N) .要从标准正态分布中获取 N 个随机样本,您可以使用np.random.randn(N)或 scipy 的stats.norm.rvs(size=N) These samples then can be used to create histogram.然后这些样本可用于创建直方图。

To draw the curve, stats.norm.pdf(y) can be used, where y is an array of subsequent x-values.要绘制曲线,可以使用stats.norm.pdf(y) ,其中y是一系列后续 x 值。 Such a pdf is normalized, ie the area under the plot is 1. The total area of the histogram is the number of samples times the width of the bins (each sample falls in exactly one bin).这样的pdf是归一化的,即 plot 下的面积为 1。直方图的总面积是样本数乘以 bin 的宽度(每个样本正好落在一个 bin 中)。 Therefore, multiplying the pdf with that factor will scale it to the height of the histogram.因此,将 pdf 乘以该因子会将其缩放到直方图的高度。

The result of stats.norm.pdf(np.random.randn(N)) would be a list of probabilties of N random samples. stats.norm.pdf(np.random.randn(N))的结果将是 N 个随机样本的概率列表。 Most samples will end up near the center of the curve (at y = 0 ), where the height of the pdf is about 0.40 .大多数样本最终会接近曲线的中心(在y = 0处),其中 pdf 的高度约为0.40 This explains the high peak near that maximum.这解释了该最大值附近的高峰值。

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

N = 100000
# x = np.random.randn(N)
x = stats.norm.rvs(size=N)
num_bins = 20
plt.hist(x, bins=num_bins, facecolor='blue', alpha=0.5)

y = np.linspace(-4, 4, 1000)
bin_width = (x.max() - x.min()) / num_bins
plt.plot(y, stats.norm.pdf(y) * N * bin_width)

plt.show()

示例图

'import numpy as np
import seaborn as sns
N = 1000
x = np.random.randn(N)
sns.histplot(x,bins=20,kde=True,color='red')'

histplot using seaborn使用 seaborn 的 histplot

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM