简体   繁体   English

Python中的幂律分布拟合

[英]Power law distribution fitting in Python

I am using different python to fit density functions on a dataset.我正在使用不同的 python 来拟合数据集上的密度函数。 This data set is made of positive time values starting from 1 second.该数据集由从 1 秒开始的正时间值组成。

I tested different density functions from scipy.statistics and the powerlaw library, as well as my own functions using scipy.optimize 's function curve_fit() .我测试了不同密度函数scipy.statisticspowerlaw库使用,以及我自己的功能scipy.optimize的功能curve_fit()

So far, I obtained the best results when fitting the following "modified" power law function :到目前为止,我在拟合以下“修改后的”幂律函数时获得了最佳结果:

def funct(x, alpha, x0):
    return((x+x0)**(-alpha))

My code is as follow :我的代码如下:

bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]

param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
                    x_data,
                    y_data,
                    bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])

C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]

# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)

fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None',  marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None',  marker='.')

The fitting returns a value of 8.48 for x0, and of 1.40 for alpha.拟合为 x0 返回 8.48 的值,为 alpha 返回 1.40 的值。 In the loglog plot, the data and fit plot look like this :在 loglog 图中,数据和拟合图如下所示:

阴谋

  • My first question is technical .我的第一个问题是技术问题 Why do I get the following warning and error in opt.curve_fit when changing (x+x0) to (x-x0) in the funct function ?为什么我得到以下警告和错误opt.curve_fit在不断变化的(X + X0)到(X-X0)时, funct的功能? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48.由于我对 x0 的界限是 (-inf, +inf),我期待拟合返回 -8.48。

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point. /anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: 运行时警告:在倒数中遇到除以零 这与 ipykernel 包分开,因此我们可以避免进行导入,直到 ValueError: Residuals are not in the initial观点。

  • My other questions are theoritical .我的其他问题是理论上的 Is (x+x0)^(-alpha) a standard distribution? (x+x0)^(-alpha) 是标准分布吗? What does the x0 value represents, how to interpret this 8.48s value physically? x0值代表什么,物理上怎么解释这个8.48s的值? From what I understand this means that my distribution corresponds to a shifted power law distribution?据我了解,这意味着我的分布对应于幂律分布? Can I consider that x0 corresponds to the xmin value classically needed when fitting data to power laws ?我可以认为 x0 对应于将数据拟合到幂律时通常需要的 xmin 值吗?
  • Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.关于这个 xmin 值,我理解在拟合过程中只考虑大于这个阈值的数据来表征分布的尾部是有意义的。 However, I am wondering what is the standard way to characterise the full data with a distribution that would be a power law after xmin and something else before xmin.但是,我想知道用 xmin 之后的幂律和 xmin 之前的其他分布来表征完整数据的标准方法是什么。

This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!这是很多问题,因为我对这个主题非常陌生,任何评论和回答,即使是部分的,都将不胜感激!

Is (x+x0)^(-alpha) a standard distribution? (x+x0)^(-alpha) 是标准分布吗?

To answer your second question, yes, it is standard distribution, called Zipf distribution .要回答您的第二个问题,是的,它是标准发行版,称为Zipf 发行版 It is implemented in Python/NumPy as well .它是在Python / NumPy的实施,以及

What does the x0 value represents x0 值代表什么

this is shift parameter.这是移位参数。 Any distribution on top of standard parameters (like power parameter in Zipf) might have shift and scale parameters, which basically says your X values are measured in different units with different origin point.标准参数之上的任何分布(如 Zipf 中的功率参数)都可能具有位移和比例参数,这基本上表示您的 X 值是用不同的单位以不同的原点测量的。

Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.关于这个 xmin 值,我理解在拟合过程中只考虑大于这个阈值的数据来表征分布的尾部是有意义的。

This is how Zipf law is defined, from 0 to Infinity.这就是 Zipf 定律的定义方式,从 0 到无穷大。 Shifting it means your origin would be different改变它意味着你的起源会有所不同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM