[英]How to find percentage of values within given range in python?
Problem statement - Variable X has a mean of 15 and a standard deviation of 2.问题陈述 - 变量 X 的平均值为 15,标准差为 2。
What is the minimum percentage of X values that lie between 8 and 17?介于 8 和 17 之间的 X 值的最小百分比是多少?
I know about 68-95-99.7 empirical rule.我知道 68-95-99.7 经验法则。 From Google I found that percentage of values within 1.5 standard deviations is 86.64%.从谷歌我发现 1.5 个标准差内的值的百分比是 86.64%。 My code so far:到目前为止我的代码:
import scipy.stats
import numpy as np
X=np.random.normal(15,2)
As I understood,据我了解,
13-17 is within 1 standard deviation having 68% values. 13-17 在 1 个标准偏差内,具有 68% 的值。
9-21 will be 3 standard deviations having 99.7% values. 9-21 将是具有 99.7% 值的 3 个标准偏差。
7-23 is 4 standard deviations. 7-23 是 4 个标准差。 So 8 is 3.5 standard deviations below the mean.所以 8 比平均值低 3.5 个标准差。
How to find the percentage of values from 8 to 17?如何找到从 8 到 17 的值的百分比?
You basically want to know the area under the Probability Density Function (PDF) from x1=8 to x2=17.您基本上想知道从 x1=8 到 x2=17 的概率密度 Function (PDF) 下的区域。
You know that the area of PDF is the integral, so it is Cumulative Density Function (CDF).你知道PDF的面积是积分,所以它是累积密度Function(CDF)。
Thus, to find the area between two specific values of x you need to integrate the PDF between these values, which is equivalent to do CDF[x2] - CDF[x1].因此,要找到 x 的两个特定值之间的区域,您需要在这些值之间积分 PDF,这相当于做 CDF[x2] - CDF[x1]。
So, in python, we could do所以,在 python 中,我们可以做
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
mu = 15
sd = 2
# define the distribution
dist = sps.norm(loc=mu, scale=sd)
x = np.linspace(dist.ppf(.00001), dist.ppf(.99999))
# Probability Density Function
pdf = dist.pdf(x)
# Cumulative Density Function
cdf = dist.cdf(x)
and plot to take a look和 plot 来看看
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
axs[0].plot(x, pdf, color='k')
axs[0].fill_between(
x[(x>=8)&(x<=17)],
pdf[(x>=8)&(x<=17)],
alpha=.25
)
axs[0].set(
title='PDF'
)
axs[1].plot(x, cdf)
axs[1].axhline(dist.cdf(8), color='r', ls='--')
axs[1].axhline(dist.cdf(17), color='r', ls='--')
axs[1].set(
title='CDF'
)
plt.show()
So, the value we want is that area, that we can calculate as所以,我们想要的值是那个面积,我们可以计算为
cdf_at_8 = dist.cdf(8)
cdf_at_17 = dist.cdf(17)
cdf_between_8_17 = cdf_at_17 - cdf_at_8
print(f"{cdf_between_8_17:.1%}")
that gives 84.1%
.这给出了84.1%
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.