[英]Power law distribution fitting in Python
I am using different python to fit density functions on a dataset.我正在使用不同的 python 来拟合数据集上的密度函数。 This data set is made of positive time values starting from 1 second.该数据集由从 1 秒开始的正时间值组成。
I tested different density functions from scipy.statistics
and the powerlaw
library, as well as my own functions using scipy.optimize
's function curve_fit()
.我测试了不同密度函数scipy.statistics
和powerlaw
库使用,以及我自己的功能scipy.optimize
的功能curve_fit()
So far, I obtained the best results when fitting the following "modified" power law function :到目前为止,我在拟合以下“修改后的”幂律函数时获得了最佳结果:
def funct(x, alpha, x0):
return((x+x0)**(-alpha))
My code is as follow :我的代码如下:
bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]
param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
x_data,
y_data,
bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])
C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]
# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)
fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None', marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None', marker='.')
The fitting returns a value of 8.48 for x0, and of 1.40 for alpha.拟合为 x0 返回 8.48 的值,为 alpha 返回 1.40 的值。 In the loglog plot, the data and fit plot look like this :在 loglog 图中,数据和拟合图如下所示:
opt.curve_fit
when changing (x+x0) to (x-x0) in the funct
function ?为什么我得到以下警告和错误opt.curve_fit
在不断变化的(X + X0)到(X-X0)时, funct
的功能? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48.由于我对 x0 的界限是 (-inf, +inf),我期待拟合返回 -8.48。/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point. /anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: 运行时警告:在倒数中遇到除以零 这与 ipykernel 包分开,因此我们可以避免进行导入,直到 ValueError: Residuals are not in the initial观点。
This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!这是很多问题,因为我对这个主题非常陌生,任何评论和回答,即使是部分的,都将不胜感激!
Is (x+x0)^(-alpha) a standard distribution? (x+x0)^(-alpha) 是标准分布吗?
To answer your second question, yes, it is standard distribution, called Zipf distribution .要回答您的第二个问题,是的,它是标准发行版,称为Zipf 发行版。 It is implemented in Python/NumPy as well .它是在Python / NumPy的实施,以及。
What does the x0 value represents x0 值代表什么
this is shift parameter.这是移位参数。 Any distribution on top of standard parameters (like power parameter in Zipf) might have shift and scale parameters, which basically says your X values are measured in different units with different origin point.标准参数之上的任何分布(如 Zipf 中的功率参数)都可能具有位移和比例参数,这基本上表示您的 X 值是用不同的单位以不同的原点测量的。
Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.关于这个 xmin 值,我理解在拟合过程中只考虑大于这个阈值的数据来表征分布的尾部是有意义的。
This is how Zipf law is defined, from 0 to Infinity.这就是 Zipf 定律的定义方式,从 0 到无穷大。 Shifting it means your origin would be different改变它意味着你的起源会有所不同
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.