简体   繁体   English

Python 中的指数回归

[英]Exponential Regression in Python

I have a set of x and y data and I want to use exponential regression to find the line that best fits those set of points.我有一组xy数据,我想使用指数回归来找到最适合这些点的线。 ie: IE:

y = P1 + P2 exp(-P0 x)

I want to calculate the values of P0 , P1 and P2 .我想计算P0P1P2的值。

I use a software "Igor Pro" that calculates the values for me, but want a Python implementation.我使用软件“Igor Pro”为我计算值,但需要 Python 实现。 I used the curve_fit function, but the values that I get are nowhere near the ones calculated by Igor software.我使用了curve_fit function,但我得到的值与 Igor 软件计算的值相去甚远。 Here is the sets of data that I have:这是我拥有的数据集:

Set1:设置 1:

x = [ 1.06, 1.06, 1.06, 1.06, 1.06, 1.06, 0.91, 0.91, 0.91 ]
y = [ 476, 475, 476.5, 475.25, 480, 469.5, 549.25, 548.5, 553.5 ]

Values calculated by Igor: Igor 计算的值:

P1=376.91, P2=5393.9, P0=3.7776

Values calculated by curve_fit :通过curve_fit计算的值:

P1=702.45, P2=-13.33. P0=-2.6744

Set2:第二组:

x = [ 1.36, 1.44, 1.41, 1.745, 2.25, 1.42, 1.45, 1.5, 1.58]
y = [ 648, 618, 636, 485, 384, 639, 630, 583, 529]

Values calculated by Igor: Igor 计算的值:

P1=321, P2=4848, P0=-1.94

Values calculated by curve_fit:通过 curve_fit 计算的值:

No optimal values found

I use curve_fit as follow:我使用curve_fit如下:

from scipy.optimize import curve_fit
popt, pcov = curve_fit(lambda t, a, b, c: a * np.exp(-b * t) + c, x, y)

where:在哪里:

P1=c, P2=a and P0=b

Well, when comparing fit results, it is always important to include uncertainties in the fitted parameters.那么,在比较拟合结果时,在拟合参数中包含不确定性始终很重要。 That is, when you say that the values from Igor (P1=376.91, P2=5393.9, P0=3.7776), and from curve_fit (P1=702.45, P2=-13.33. P0=-2.6744) are different, what is it that leads to conclude those values are actually different?也就是说,当您说来自 Igor (P1=376.91, P2=5393.9, P0=3.7776) 和来自 curve_fit (P1=702.45, P2=-13.33. P0=-2.6744) 的值不同时,那是什么导致得出这些值实际上不同的结论?

Of course, in everyday conversation, 376.91 and 702.45 are very different, mostly because simply stating a value to 2 decimal places implies accuracy at approximately that scale (the distance between New York and Tokyo is 10,850 km but is not really 10,847,024,31 cm -- that might be the distance between bus stops in the two cities).当然,在日常对话中,376.91 和 702.45 是非常不同的,主要是因为简单地把一个值说到小数点后两位就意味着大约这个比例的精度(纽约和东京之间的距离是 10,850 公里,但实际上不是 10,847,024,31 厘米 - - 这可能是两个城市公交车站之间的距离)。 But when comparing fit results, that everyday knowledge cannot be assumed, and you have to include uncertainties.但是在比较拟合结果时,不能假设日常知识,你必须包括不确定性。 I don't know if Igor will give you those.我不知道 Igor 是否会给你这些。 scipy curve_fit can, but it requires some work to extract them -- a pity. scipy curve_fit可以,但提取它们需要一些工作——很遗憾。

Allow me to recommend trying lmfit (disclaimer: I am an author).请允许我推荐尝试 lmfit(免责声明:我是作者)。 With that, you would set up and execute the fit like this:这样,您就可以像这样设置和执行拟合:

import numpy as np
from lmfit import Model
    
x = [ 1.06, 1.06, 1.06, 1.06, 1.06, 1.06, 0.91, 0.91, 0.91 ]
y = [ 476, 475, 476.5, 475.25, 480, 469.5, 549.25, 548.5, 553.5 ]
# x = [ 1.36, 1.44, 1.41, 1.745, 2.25, 1.42, 1.45, 1.5, 1.58]
# y = [ 648, 618, 636, 485, 384, 639, 630, 583, 529]    

# Define the function that we want to fit to the data
def func(x, offset, scale, decay):
    return offset + scale * np.exp(-decay* x)
    
model = Model(func)
params = model.make_params(offset=375, scale=5000, decay=4)
    
result = model.fit(y, params, x=x)
    
print(result.fit_report())

This would print out the result of这将打印出结果

[[Model]]
    Model(func)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 49
    # data points      = 9
    # variables        = 3
    chi-square         = 72.2604167
    reduced chi-square = 12.0434028
    Akaike info crit   = 24.7474672
    Bayesian info crit = 25.3391410
    R-squared          = 0.99362489
[[Variables]]
    offset:  413.168769 +/- 17348030.9 (4198775.95%) (init = 375)
    scale:   16689.6793 +/- 1.3337e+10 (79909638.11%) (init = 5000)
    decay:   5.27555726 +/- 1016721.11 (19272297.84%) (init = 4)
[[Correlations]] (unreported correlations are < 0.100)
    C(scale, decay)  = 1.000
    C(offset, decay) = 1.000
    C(offset, scale) = 1.000

indicating that the uncertainties in the parameter values are simply enormous and the correlations between all parameters are 1. This is because you have only 2 x values, which will make it impossible to accurately determine 3 independent variables.表明参数值的不确定性非常大,所有参数之间的相关性都是 1。这是因为你只有 2 x值,这将无法准确确定 3 个自变量。

And, note that with an uncertainty of 17 million, the values for P1 (offset) of 413 and 762 do actually agree.并且请注意,由于存在 1700 万的不确定性,413 和 762 的 P1(偏移)值实际上是一致的。 The problem is not that Igor and curve_fit disagree on the best value, it is that neither can determine the value with any accuracy at all.问题不在于 Igor 和 curve_fit 在最佳值上存在分歧,而是两者都无法准确地确定该值。

For your other dataset, the situation is a little better, with a result:对于您的其他数据集,情况稍微好一些,结果是:

[[Model]]
    Model(func)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 82
    # data points      = 9
    # variables        = 3
    chi-square         = 1118.19957
    reduced chi-square = 186.366596
    Akaike info crit   = 49.4002551
    Bayesian info crit = 49.9919289
    R-squared          = 0.98272310
[[Variables]]
    offset:  320.876843 +/- 42.0154403 (13.09%) (init = 375)
    scale:   4797.14487 +/- 2667.40083 (55.60%) (init = 5000)
    decay:   1.93560164 +/- 0.47764470 (24.68%) (init = 4)
[[Correlations]] (unreported correlations are < 0.100)
    C(scale, decay)  = 0.995
    C(offset, decay) = 0.940
    C(offset, scale) = 0.904

the correlations are still high, but the parameters are reasonably well determined.相关性仍然很高,但参数已合理确定。 Also, note that the best-fit values here are much closer to those you got from Igor, and probably "within the uncertainty".另外,请注意,此处的最佳拟合值与您从 Igor 获得的值更接近,并且可能“在不确定范围内”。

And this is why one always needs to include uncertainties with the best-fit values reported from a fit.这就是为什么人们总是需要在拟合报告的最佳拟合值中包含不确定性。

Set 1:第 1 组:

x = [ 1.06, 1.06, 1.06, 1.06, 1.06, 1.06, 0.91, 0.91, 0.91 ] x = [ 1.06, 1.06, 1.06, 1.06, 1.06, 1.06, 0.91, 0.91, 0.91 ]

y = [ 476, 475, 476.5, 475.25, 480, 469.5, 549.25, 548.5, 553.5 ] y = [ 476, 475, 476.5, 475.25, 480, 469.5, 549.25, 548.5, 553.5 ]

在此处输入图像描述

One observe that they are only two different values of x: 1.06 and 0.91人们观察到它们只是 x 的两个不同值:1.06 和 0.91

On the other hand they are three parameters to optimise: P0, P1 and P2.另一方面,它们是三个要优化的参数:P0、P1 和 P2。 This is too much.这太多了。

In other words an infinity of exponential curves can be found to fit the two clusters of points.换句话说,可以找到无穷多的指数曲线来拟合这两个点群。 The differences between the curves can be due to slight difference of the computation methods of non-linear regression especially due to the methods to chose the initial values of the iterative process.曲线之间的差异可能是由于非线性回归计算方法的细微差异,特别是由于选择迭代过程初始值的方法。

In this particular case a simple linear regression would be without ambiguity.在这种特殊情况下,简单的线性回归将不会产生歧义。

By comparison:通过比较:

在此处输入图像描述

Thus both Igor and Curve_fit give excellent fitting: The points are very close to both curves.因此 Igor 和 Curve_fit 都给出了很好的拟合:点非常接近两条曲线。 One understand that infinity many other exponential fuctions would fit as well.可以理解,无限多的其他指数函数也适用。


Set 2:第 2 组:

x = [ 1.36, 1.44, 1.41, 1.745, 2.25, 1.42, 1.45, 1.5, 1.58] x = [ 1.36, 1.44, 1.41, 1.745, 2.25, 1.42, 1.45, 1.5, 1.58]

y = [ 648, 618, 636, 485, 384, 639, 630, 583, 529] y = [ 648, 618, 636, 485, 384, 639, 630, 583, 529]

The difficulty that you meet might be due to the choice of "guessed" initial values of the parameters which are required to start the iterative process of nonlinear regression.您遇到的困难可能是由于选择了启动非线性回归迭代过程所需的参数的“猜测”初始值。

In order to check this hypothesis one can use a different method which doesn't need initial guessed values.为了检验这一假设,可以使用一种不需要初始猜测值的不同方法。 The MathCad code and numerical calculus are shown below. MathCad 代码和数值计算如下所示。

在此处输入图像描述

在此处输入图像描述

在此处输入图像描述

Don't be surprised if the values of the parameters that you get with your software are slightly different from the above values (a, b, c).如果您使用软件获得的参数值与上述值 (a、b、c) 略有不同,请不要感到惊讶。 The criteria of fitting implicitly set in your software is probably different from the criteria of fitting set in my software.您软件中隐含设置的拟合标准可能与我软件中设置的拟合标准不同。

在此处输入图像描述

Blue curve: The method of regression is a Least Mean Square Errors wrt a linear integral equation to which the exponential equation is solution.蓝色曲线:回归方法是最小均方误差 wrt 指数方程的解的线性积分方程。 Ref.: https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales参考: https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales

This non-standard method isn't iterative and doesn't require initial "guessed" values of parameters.这种非标准方法不是迭代的,也不需要参数的初始“猜测”值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM