简体   繁体   English

使用(python)Scipy拟合帕累托分布

[英]Fitting a pareto distribution with (python) Scipy

I have a data set that I know has a Pareto distribution.我有一个我知道有帕累托分布的数据集。 Can someone point me to how to fit this data set in Scipy?有人能指出我如何在 Scipy 中拟合这个数据集吗? I got the below code to run but I have no idea what is being returned to me (a,b,c).我运行了下面的代码,但我不知道返回给我的是什么(a、b、c)。 Also, after obtaining a,b,c, how do I calculate the variance using them?另外,在获得 a,b,c 后,我如何使用它们计算方差?

import scipy.stats as ss 
import scipy as sp

a,b,c=ss.pareto.fit(data)

Be very careful fitting power laws!!非常小心地拟合幂律!! Many reported power laws are actually badly fitted by a power law.许多报道的幂律实际上不符合幂律。 See Clauset et al.克劳塞特等人。 for all the details (also on arxiv if you don't have access to the journal).有关所有详细信息(如果您无法访问该期刊,也可以在arxiv查看)。 They have a companion website to the article which now links to a Python implementation.他们有一个文章的配套网站,现在链接到 Python 实现。 Don't know if it uses Scipy because I used their R implementation when I last used it.不知道它是否使用 Scipy,因为我上次使用它时使用了他们的 R 实现。

Here's a quickly written version, taking some hints from the Reference page that Rupert gave.这是一个快速编写的版本,从 Rupert 提供的参考页面中获取了一些提示。 This is currently work in progress in scipy and statsmodels and requires MLE with some fixed or frozen parameters, which is only available in the trunk versions.这目前正在 scipy 和 statsmodels 中进行,并且需要具有一些固定或冻结参数的 MLE,这仅在主干版本中可用。 No standard errors on the parameter estimates or other result statistics are available yet.目前还没有关于参数估计或其他结果统计的标准误差。

'''estimating pareto with 3 parameters (shape, loc, scale) with nested
minimization, MLE inside minimizing Kolmogorov-Smirnov statistic

running some examples looks good
Author: josef-pktd
'''

import numpy as np
from scipy import stats, optimize
#the following adds my frozen fit method to the distributions
#scipy trunk also has a fit method with some parameters fixed.
import scikits.statsmodels.sandbox.stats.distributions_patch

true = (0.5, 10, 1.)   # try different values
shape, loc, scale = true
rvs = stats.pareto.rvs(shape, loc=loc, scale=scale, size=1000)

rvsmin = rvs.min() #for starting value to fmin


def pareto_ks(loc, rvs):
    est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, loc, np.nan])
    args = (est[0], loc, est[1])
    return stats.kstest(rvs,'pareto',args)[0]

locest = optimize.fmin(pareto_ks, rvsmin*0.7, (rvs,))
est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, locest, np.nan])
args = (est[0], locest[0], est[1])
print 'estimate'
print args
print 'kstest'
print stats.kstest(rvs,'pareto',args)
print 'estimation error', args - np.array(true)

Before passing the data to build() function in OPENTURNS, make sure to convert it this way:在将数据传递给 OPENTURNS 中的 build() 函数之前,请确保以这种方式进行转换:

data = [[i] for i in data]

Because Sample() function may return an error.因为 Sample() 函数可能会返回错误。

FYI @Tropilio仅供参考@Tropilio

Let's say you data is formated like this假设您的数据格式如下

import openturns as ot
data = [
    [2.7018013],
    [8.53280352],
    [1.15643882],
    [1.03359467],
    [1.53152735],
    [32.70434285],
    [12.60709624],
    [2.012235],
    [1.06747063],
    [1.41394096],
]
sample = ot.Sample([[v] for v in data])

You can easily fit a Pareto distribution using ParetoFactory of OpenTURNS library:您可以使用很容易地适应一个帕累托分布ParetoFactory OpenTURNS库:

distribution = ot.ParetoFactory().build(sample)

You can of course print it:你当然可以打印它:

print(distribution)
>>> Pareto(beta = 0.00317985, alpha=0.147365, gamma=1.0283)

or plot its PDF:或绘制其 PDF:

from openturns.viewer import View

pdf_graph = distribution.drawPDF()
pdf_graph.setTitle(str(distribution))
View(pdf_graph, add_legend=False)

帕累托分布

More details on the ParetoFactory are provided in the documentation.文档中提供了有关ParetoFactory 的更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM