简体   繁体   English

python scipy.stats pdf和期望函数

[英]python scipy.stats pdf and expect functions

I was wondering if someone could please explain what the following functions in scipy.stats do: 我想知道是否有人可以解释scipy.stats中的以下功能:

rv_continuous.expect
rv_continuous.pdf

I have read the documentation but I am still confused. 我已经阅读了文档,但仍然感到困惑。

Here is my task, quite simple in theory, but I am still confused with what these functions do. 这是我的任务,理论上很简单,但是我仍然对这些功能的作用感到困惑。

So, I have a list of areas, 16383 values. 因此,我有16383个值的区域列表。 I want to find the probability that the variable area takes any value between a smaller value , called "inf" and a larger value "sup". 我想找到可变区域取较小值(称为“ inf”)和较大值“ sup”之间的任何值的可能性。

So, what I thought is: 所以,我想的是:

scipy.stats.rv_continuous.pdf(a) #a being the list of areas
scipy.stats.rv_continuous.expect(pdf, lb = inf, ub = sup)

So that i can get the probability that any area is between sup and inf. 这样我就可以确定任何区域在sup和inf之间。

Can anyone help me by explaining in a simple way what the functions do and any hint on how to compute the integral of f(a) between inf and sup, please? 有人能通过简单的方式解释这些函数的作用以及有关如何计算inf和sup之间的f(a)积分的任何提示吗?

Thanks 谢谢

Blaise 布莱斯

rv_continuous is a base class for all of the probability distributions implemented in scipy.stats . rv_continuous是所有中实现的概率分布的一个基类scipy.stats You would not call methods on rv_continuous yourself. 您不会自己在rv_continuous上调用方法。

Your question is not entirely clear about what you want to do, so I will assume that you have an array of 16383 data points drawn from some unknown probability distribution. 您的问题尚不清楚您要做什么,因此我假设您有16383个数据点数组,这些数据点是从某种未知的概率分布中得出的。 From the raw data, you will need to estimate the cumulative distribution, find the values of that cumulative distribution at the sup and inf values and subtract to find the probability that a value drawn from the unknown distribution. 从原始数据中,您将需要估计累积分布,在supinf值处找到该累积分布的值,然后减去以找出从未知分布中提取值的可能性。

There are lots of ways to estimate the unknown distribution from the data depending on how much modelling you want to do and how many assumptions you want to make. 有多种方法可以根据数据估算未知分布,具体取决于您要进行多少建模和要进行多少假设。 At the more complicated end of the spectrum, you could try to fit one of the standard parametric probability distributions to the data. 在频谱更复杂的一端,您可以尝试将标准参数概率分布之一拟合到数据中。 For example, if you had a suspicion that your data were lognormally distributed, you could use scipy.stats.lognorm.fit(data, floc=0) to find the parameters of the lognormal distribution that fit your data. 例如,如果您怀疑数据是对数正态分布的,则可以使用scipy.stats.lognorm.fit(data, floc=0)查找适合数据的对数正态分布的参数。 Then you could use scipy.stats.lognorm.cdf(sup, *params) - scipy.stats.lognorm.cdf(inf, *params) to estimate the probability of the value being between those values. 然后,您可以使用scipy.stats.lognorm.cdf(sup, *params) - scipy.stats.lognorm.cdf(inf, *params)来估计该值介于这些值之间的可能性。

In the middle are the non-parametric forms of distribution estimation like histograms and kernel density estimates. 中间是分布估计的非参数形式,例如直方图和核密度估计。 For example, scipy.stats.gaussian_kde(data).integrate_box_1d(inf, sup) is an easy way to make this estimate using a Gaussian kernel density estimate of the unknown distribution. 例如, scipy.stats.gaussian_kde(data).integrate_box_1d(inf, sup)是使用未知分布的高斯核密度估计进行此估计的简便方法。 However, kernel density estimates aren't always appropriate and require some tweaking to get right. 但是,内核密度估计并不总是合适的,需要进行一些调整才能正确。

The simplest thing you could do is just count the number of data points that fall between inf and sup and divide by the total number of data points that you have. 您可以做的最简单的事情就是计算介于infsup之间的数据点的数量,然后除以您拥有的数据点的总数。 This only works well with a largish number of points (which you have) and with bounds that aren't too far in the tails of the data. 这仅适用于较大数量的点(具有的点)以及在数据尾部距离不太远的边界。

The cumulative density function might give you what you want. 累积密度函数可能会给您您想要的。 Then the probability P of being between two values is P(inf < area < sup) = cdf(sup) - cdf(inf) 那么介于两个值之间的概率PP(inf < area < sup) = cdf(sup) - cdf(inf)

There's a tutorial about probabilities here and here They are all related. 这里这里都有关于概率的教程,它们都是相关的。 The pdf is the "density" of the probabilities. pdf是概率的“密度”。 They must be greater than zero and sum to 1. I think of it as indicating how likely something is. 它们必须大于零且总和为1。我认为它表示事物发生的可能性。 The expectation is is a generalisation of the idea of average. 期望是对平均值概念的概括。

E[x] = sum(x.P(x))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM