使用LMFIT将对数正态模型拟合到数据

Question

I am looking to fit a log-normal curve to data that roughly follows a lognormal distribution. 我希望将对数正态曲线拟合到大致遵循对数正态分布的数据。

The data I have is from a laser diffraction machine which measures particle size distributions of sprays. 我得到的数据来自激光衍射仪，该仪测量喷雾的粒度分布。 The ultimate goal of this code is to recreate this method for my data, which uses OriginPro software designed for XRD data curve fitting; 这段代码的最终目标是为我的数据重新创建此方法，该方法使用专为XRD数据曲线拟合设计的OriginPro软件。 a similar problem. 一个类似的问题。 I would like to integrate the method into my own analysis for my research, which is being done in Python. 我想将该方法集成到我自己的研究分析中，这是用Python完成的。

I adapted the code from this post to (ideally) handle log-normal distributions. 我改编自代码这个帖子到（理想）处理数正态分布。 I have simplified my code to handle only the first log-normal peak in the data, so now it is only trying to fit ONE log-normal distribution. 我简化了代码，仅处理数据中的第一个对数正态峰，因此现在它仅尝试适应一个对数正态分布。 The data I have provided are also simplified to only have one peak to fit. 我提供的数据也简化为只有一个峰适合。 Sample data and code are given at the bottom of this post. 示例数据和代码在本文的底部给出。

I have some previous experience with model fitting using LMFIT, though I was using a user-defined state-space model for temporal modelling and the LMFIT minimize() function. 我以前在使用LMFIT进行模型拟合方面有一些经验，尽管我正在使用用户定义的状态空间模型进行时间建模和LMFIT minimum minimize()函数。 I am unsure of where to even start debugging the curve-fitting component of this code. 我不确定从哪里开始调试此代码的曲线拟合组件。

Can anyone help me figure out why I am unable to get a fit to this data? 谁能帮我弄清楚为什么我无法拟合此数据？ Note that the result I am getting is a trivial one (straight line at y=0). 请注意，我得到的结果是微不足道的（y = 0处的直线）。

Working on Windows 7 (laptop) and 10 (desktop) 在Windows 7（笔记本电脑）和10（台式机）上工作

Running python -V in a CMD window gives: 在CMD窗口中运行python -V可得到：

Python 3.5.3 :: Anaconda 4.1.1 (64-bit)

Here's the data for a sample distribution: 以下是样本分发的数据：

sizes = np.array([  1.26500000e-01,   1.47000000e-01,   1.71500000e-01,
     2.00000000e-01,   2.33000000e-01,   2.72000000e-01,
     3.17000000e-01,   3.69500000e-01,   4.31000000e-01,
     5.02500000e-01,   5.86000000e-01,   6.83500000e-01,
     7.97000000e-01,   9.29000000e-01,   1.08300000e+00,
     1.26250000e+00,   1.47200000e+00,   1.71650000e+00,
     2.00100000e+00,   2.33300000e+00,   2.72050000e+00,
     3.17200000e+00,   3.69800000e+00,   4.31150000e+00,
     5.02700000e+00,   5.86100000e+00,   6.83300000e+00,
     7.96650000e+00,   9.28850000e+00,   1.08295000e+01,
     1.26265000e+01,   1.47215000e+01,   1.71640000e+01,
     2.00115000e+01,   2.33315000e+01,   2.72030000e+01,
     3.17165000e+01,   3.69785000e+01,   4.31135000e+01,
     5.02665000e+01,   5.86065000e+01,   6.83300000e+01,
     7.96670000e+01,   9.28850000e+01,   1.08296000e+02,
     1.26264000e+02,   1.47213000e+02,   1.71637500e+02,
     2.00114500e+02,   2.33316500e+02])

y_exp = np.array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.01,
    0.02,  0.03,  0.04,  0.06,  0.07,  0.08,  0.09,  0.1 ,  0.11,
    0.13,  0.19,  0.3 ,  0.48,  0.74,  1.1 ,  1.56,  2.11,  2.72,
    3.37,  3.99,  4.55,  4.99,  5.3 ,  5.48,  5.53,  5.48,  5.36,
    5.19,  4.97,  4.67,  4.28,  3.79,  3.18,  2.48,  1.73,  1.  ,
    0.35,  0.  ,  0.  ,  0.  ,  0.  ])

Here are the functions: 功能如下：


def generate_model(spec):
    composite_model = None
    params = None
    x = spec['x']
    y = spec['y']
    x_min = np.min(x)
    x_max = np.max(x)
    x_range = x_max - x_min
    y_max = np.max(y)
    for i, basis_func in enumerate(spec['model']):
#        prefix = f'm{i}_'
        prefix = 'm{0}_'.format(i)
        model = getattr(models, basis_func['type'])(prefix=prefix)
        if basis_func['type'] in ['LognormalModel','GaussianModel', 'LorentzianModel', 'VoigtModel']: # for now VoigtModel has gamma constrained to sigma
            model.set_param_hint('sigma', min=1e-6, max=x_range)
            model.set_param_hint('center', min=x_min, max=x_max)
            model.set_param_hint('height', min=1e-6, max=1.1*y_max)
            model.set_param_hint('amplitude', min=1e-6)
            # default guess is horrible!! do not use guess()
            default_params = {
                prefix+'center': x_min + x_range * random.random(),
                prefix+'height': y_max * random.random(),
                prefix+'sigma': x_range * random.random()
                }
        else:
#            raise NotImplemented(f'model {basis_func["type"]} not implemented yet')
            raise NotImplemented('model {0} not implemented yet'.format(basis_func["type"])) 
        if 'help' in basis_func:  # allow override of settings in parameter
            for param, options in basis_func['help'].items():
                model.set_param_hint(param, **options)
        model_params = model.make_params(**default_params, **basis_func.get('params', {}))
        if params is None:
            params = model_params
        else:
            params.update(model_params)
        if composite_model is None:
            composite_model = model
        else:
            composite_model = composite_model + model
    return composite_model, params

def update_spec_from_peaks(spec, model_indicies, peak_widths=np.arange(1,10), **kwargs):
    x = spec['x']
    y = spec['y']
    x_range = np.max(x) - np.min(x)
    peak_indicies = signal.find_peaks_cwt(y, peak_widths)
    np.random.shuffle(peak_indicies)
#    for peak_indicie, model_indicie in zip(peak_indicies.tolist(), model_indicies):
    for peak_indicie, model_indicie in zip(peak_indicies, model_indicies):
        model = spec['model'][model_indicie]
        if model['type'] in ['LognormalModel','GaussianModel', 'LorentzianModel', 'VoigtModel']:
            params = {
                'height': y[peak_indicie],
                'sigma': x_range / len(x) * np.min(peak_widths),
                'center': x[peak_indicie]
            }
            if 'params' in model:
                model.update(params)
            else:
                model['params'] = params
        else:
#            raise NotImplemented(f'model {basis_func["type"]} not implemented yet')
            raise NotImplemented('model {0} not implemented yet'.format(model["type"])) 
    return peak_indicies

Here is the mainline: 这是主线：

spec = {
    'x': sizes,
    'y': y_exp,
    'model': [
        {
            'type': 'LognormalModel',
            'params': {'center': 20, 'height': 3, 'sigma': 1},
#            'help': {'center': {'min': 10, 'max': 30}}
        }]}

num_comp = list(range(0,len(spec['model'])))

peaks_found = update_spec_from_peaks(spec, num_comp, peak_widths=np.arange(1,10))

#For checking peak fitting
print(peaks_found)
fig, ax = plt.subplots()
ax.scatter(spec['x'], spec['y'], s=4)
for i in peaks_found:
    ax.axvline(x=spec['x'][i], c='black', linestyle='dotted')

model, params = generate_model(spec)

output = model.fit(spec['y'], params, x=spec['x'])

fig, gridspec = output.plot()

Thanks for any help, and make it a great day. 感谢您的帮助，祝您度过愉快的一天。

Isaac 以撒

Answer 1

The standard advice on Stackoverflow and for problem-solving in general is to reduce the problem to a minimal script that shows the problem. 关于Stackoverflow以及通常用于解决问题的标准建议是将问题减少到一个最小的脚本来显示问题。 See, for example, https://stackoverflow.com/help/mcve . 参见例如https://stackoverflow.com/help/mcve 。 This approach encourages stripping the problem down and often helps point to where the problem is in your code. 这种方法鼓励简化问题，并且通常有助于指出问题在代码中的位置。 It is a classic approach to problem-solving. 这是解决问题的经典方法。

It turns out that your script has quite a bit extra stuff. 事实证明，您的脚本还有很多额外的东西。 Stripping down to essentials would give: 精简要领将给出：

import numpy as np
from lmfit import models
import matplotlib.pyplot as plt

x = np.array([ 1.26500000e-01, 1.47000000e-01, 1.71500000e-01,
            2.00000000e-01, 2.33000000e-01, 2.72000000e-01,
            3.17000000e-01, 3.69500000e-01, 4.31000000e-01,
            5.02500000e-01, 5.86000000e-01, 6.83500000e-01,
            7.97000000e-01, 9.29000000e-01, 1.08300000e+00,
            1.26250000e+00, 1.47200000e+00, 1.71650000e+00,
            2.00100000e+00, 2.33300000e+00, 2.72050000e+00,
            3.17200000e+00, 3.69800000e+00, 4.31150000e+00,
            5.02700000e+00, 5.86100000e+00, 6.83300000e+00,
            7.96650000e+00, 9.28850000e+00, 1.08295000e+01,
            1.26265000e+01, 1.47215000e+01, 1.71640000e+01,
            2.00115000e+01, 2.33315000e+01, 2.72030000e+01,
            3.17165000e+01, 3.69785000e+01, 4.31135000e+01,
            5.02665000e+01, 5.86065000e+01, 6.83300000e+01,
            7.96670000e+01, 9.28850000e+01, 1.08296000e+02,
            1.26264000e+02, 1.47213000e+02, 1.71637500e+02,
            2.00114500e+02, 2.33316500e+02])

y = np.array([ 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.02,
           0.03, 0.04, 0.06, 0.07, 0.08, 0.09, 0.1 , 0.11, 0.13, 0.19,
           0.3 , 0.48, 0.74, 1.1 , 1.56, 2.11, 2.72, 3.37, 3.99, 4.55,
           4.99, 5.3 , 5.48, 5.53, 5.48, 5.36, 5.19, 4.97, 4.67, 4.28,
           3.79, 3.18, 2.48, 1.73, 1.  , 0.35, 0.  , 0.  , 0.  , 0.  ])

model = models.LognormalModel()
params = model.make_params(center=20, sigma=3, amplitude=5)

result = model.fit(y, params, x=x)
print(result.fit_report())

plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.legend()
plt.show()

This runs and gives a decent if not quite perfect fit. 这样可以运行，即使不是很完美也可以提供不错的选择。

In general, I would discourage you from setting "parameter hints" based on data ranges. 通常，我不鼓励您根据数据范围设置“参数提示”。 Use the ability to set such limits sparingly and only where they are inherent to the model (for example that sigma<0 makes no sense). 仅在模型固有的位置（例如， sigma<0没有意义）使用此功能来有限地设置此类限制。

I have no idea what your code to use random numbers to set initial values, but it sure looks to me like it is likely to set initial values that extremely poor choices. 我不知道您的代码使用什么随机数来设置初始值，但是在我看来，它似乎很可能会设置非常差的初始值。

使用LMFIT将对数正态模型拟合到数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-02-16 22:31:52

使用LMFIT将对数正态模型拟合到数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-02-16 22:31:52

解决方案1
0 已采纳 2019-02-16 22:31:52