使用Scipy vs Matlab拟合对数正态分布

Question

I am trying to fit a lognormal distribution using Scipy. 我正在尝试使用Scipy拟合对数正态分布。 I've already done it using Matlab before but because of the need to extend the application beyond statistical analysis, I am in the process of trying to reproduce the fitted values in Scipy. 我之前已经使用Matlab完成了它，但由于需要将应用程序扩展到统计分析之外，我正在尝试在Scipy中重现拟合值。

Below is the Matlab code I used to fit my data: 下面是我用来拟合数据的Matlab代码：

% Read input data (one value per line)
x = [];
fid = fopen(file_path, 'r'); % reading is default action for fopen
disp('Reading network degree data...');
if fid == -1
    disp('[ERROR] Unable to open data file.')
else
    while ~feof(fid)
        [x] = [x fscanf(fid, '%f', [1])];

    end
    c = fclose(fid);
    if c == 0
         disp('File closed successfully.');
    else
        disp('[ERROR] There was a problem with closing the file.');
    end
end

[f,xx] = ecdf(x);
y = 1-f;

parmhat  = lognfit(x); % MLE estimate
mu = parmhat(1);
sigma = parmhat(2);

And here's the fitted plot: 这是合适的情节：

在此输入图像描述

Now here's my Python code with the aim of achieving the same: 现在这是我的Python代码，目的是实现相同的目标：

import math
from scipy import stats
from statsmodels.distributions.empirical_distribution import ECDF 

# The same input is read as a list in Python
ecdf_func = ECDF(degrees)
x = ecdf_func.x
ccdf = 1-ecdf_func.y

# Fit data
shape, loc, scale = stats.lognorm.fit(degrees, floc=0)

# Parameters
sigma = shape # standard deviation
mu = math.log(scale) # meanlog of the distribution

fit_ccdf = stats.lognorm.sf(x, [sigma], floc=1, scale=scale)

Here's the fit using the Python code. 这是使用Python代码的适合度。

在此输入图像描述

As you can see, both sets of code are capable of producing good fits, at least visually speaking. 正如您所看到的，两组代码都能够产生良好的拟合，至少在视觉上是这样。

Problem is that there is a huge difference in the estimated parameters mu and sigma. 问题是估计的参数mu和sigma存在巨大差异。

From Matlab: mu = 1.62 sigma = 1.29. 来自Matlab：mu = 1.62 sigma = 1.29。 From Python: mu = 2.78 sigma = 1.74. 来自Python：mu = 2.78 sigma = 1.74。

Why is there such a difference? 为什么会有这样的差异？

Note: I have double checked that both sets of data fitted are exactly the same. 注意：我仔细检查了两组数据是否完全相同 。 Same number of points, same distribution. 相同数量的点，相同的分布。

Your help is much appreciated! 非常感谢您的帮助！ Thanks in advance. 提前致谢。

Other info: 其他信息：

import scipy
import numpy
import statsmodels

scipy.__version__
'0.9.0'

numpy.__version__
'1.6.1'

statsmodels.__version__
'0.5.0.dev-1bbd4ca'

Version of Matlab is R2011b. 版本的Matlab是R2011b。

Edition: 版：

As demonstrated in the answer below, the fault lies with Scipy 0.9. 如下面的答案所示，故障在于Scipy 0.9。 I am able to reproduce the mu and sigma results from Matlab using Scipy 11.0. 我能够使用Scipy 11.0从Matlab重现mu和sigma结果。

An easy way to update your Scipy is: 更新Scipy的简便方法是：

pip install --upgrade Scipy

If you don't have pip (you should!): 如果你没有pip（你应该！）：

sudo apt-get install pip

Answer 1

There is a bug in the fit method in scipy 0.9.0 that has been fixed in later versions of scipy. scipy 0.9.0中的fit方法中存在一个错误，该错误已在scipy的更高版本中修复。

The output of the script below should be: 下面脚本的输出应该是：

Explicit formula:   mu = 4.99203450, sig = 0.81691086
Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086
Fit x to lognorm:   mu = 4.99203468, sig = 0.81691081

but with scipy 0.9.0, it is 但是scipy 0.9.0就是这样

Explicit formula:   mu = 4.99203450, sig = 0.81691086
Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086
Fit x to lognorm:   mu = 4.23197270, sig = 1.11581240

The following test script shows three ways to get the same results: 以下测试脚本显示了获得相同结果的三种方法：

import numpy as np
from scipy import stats


def lognfit(x, ddof=0):
    x = np.asarray(x)
    logx = np.log(x)
    mu = logx.mean()
    sig = logx.std(ddof=ddof)
    return mu, sig


# A simple data set for easy reproducibility
x = np.array([50., 50, 100, 200, 200, 300, 500])

# Explicit formula
my_mu, my_sig = lognfit(x)

# Fit a normal distribution to log(x)
norm_mu, norm_sig = stats.norm.fit(np.log(x))

# Fit the lognormal distribution
lognorm_sig, _, lognorm_expmu = stats.lognorm.fit(x, floc=0)

print "Explicit formula:   mu = %10.8f, sig = %10.8f" % (my_mu, my_sig)
print "Fit log(x) to norm: mu = %10.8f, sig = %10.8f" % (norm_mu, norm_sig)
print "Fit x to lognorm:   mu = %10.8f, sig = %10.8f" % (np.log(lognorm_expmu), lognorm_sig)

With the option ddof=1 in the std. 在std中使用选项ddof=1 。 dev. 开发。 calculation to use the unbiased variance estimation: 计算使用无偏方差估计：

In [104]: x
Out[104]: array([  50.,   50.,  100.,  200.,  200.,  300.,  500.])

In [105]: lognfit(x, ddof=1)
Out[105]: (4.9920345004312647, 0.88236457185021866)

There is a note in matlab's lognfit documentation that says when censoring is not used, lognfit computes sigma using the square root of the unbiased estimator of the variance. 在matlab的lognfit文档中有一条说明，即当不使用审查时，lognfit使用方差的无偏估计的平方根计算sigma。 This corresponds to using ddof=1 in the above code. 这对应于在上面的代码中使用ddof = 1。

使用Scipy vs Matlab拟合对数正态分布

问题描述

1 个解决方案

解决方案1
6 已采纳 2013-03-26 08:59:35

使用Scipy vs Matlab拟合对数正态分布

问题描述

1 个解决方案

解决方案1 6 已采纳 2013-03-26 08:59:35

解决方案1
6 已采纳 2013-03-26 08:59:35