简体   繁体   English

错误的指数幂图-如何改善曲线拟合

[英]Wrong Exponential Power Plot - How to improve curve fit

Unfortunately, the power fit with scipy does not return a good fit. 不幸的是,电源配合scipy不返回一个不错的选择。 I tried to use p0 as an input argument with close values which did not help. 我试图将p0用作具有接近值的输入参数,但没有帮助。

I would be very glad if someone could point out to me my problem. 如果有人可以向我指出我的问题,我将感到非常高兴。

# Imports 
from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]

# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
        return x**m * c 

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1]

# Plot function
x_function = np.linspace(0, 1.5, 100) 
y = x_function**m * c 
a = plt.scatter(x_data, y_data, s=30, marker = "v")
yfunction = x_function**m * c 
plt.plot(x_function, yfunction, '-')
plt.show()

Another dataset for which the fit is really bad would be: 拟合真的很差的另一个数据集是:

data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

I might miss something but I think the curve_fit just works fine. 我可能会错过一些东西,但我认为curve_fit可以正常工作。 When I compare the residuals obtained by curve_fit to the ones one would obtain using the parameters obtained by excel which you provide in the comments, the python results always lead to lower residuals (code is provided below). 当我将curve_fit获得的残差与使用您在注释中提供的excel获得的参数所获得的残差进行比较时,python结果总是导致残差更低(下面提供了代码)。 You say "Unfortunately the power fit with scipy does not return a good fit." 您说:“不幸的是,使用scipy的力量并没有恢复良好的状态。” but what exactly is your measure for a "good fit"? 但是,“合适”的衡量标准到底是什么? The python fit seems always be better than the excel fit with respect to the residuals. 就残差而言,python拟合似乎总是比excel拟合更好。

Not sure whether it has to be exactly this function but if not, you could also consider to add a third parameter to your function (below it is named "d") which will lead to better results. 不知道它是否必须确切地是此函数,如果不是,则还可以考虑在函数中添加第三个参数(在其下面命名为“ d”),这将导致更好的结果。

Here is the modified code. 这是修改后的代码。 I changed your "func" and also increased the resolution for the plot. 我更改了您的“功能”,还增加了情节的分辨率。 Then the residuals are printed as well. 然后,残渣也将被打印出来。 For the first data set, one obtains for excel around 79.35 and with python around 34.29. 对于第一个数据集,人们获得了大约79.35的excel和python大约34.29的excel。 For the second data set it is 15220.79 with excel and 601.08 with python (assuming I did not mess anything up). 对于第二个数据集,使用excel时为15220.79,使用python时为601.08(假设我没有弄乱任何东西)。

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]
#data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
    #slightly rewritten; you could also consider using a third parameter d
    return c*np.power(x,m) #  + d

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1] #, coeff[2]
print m, c #, d

# Plot function
a = plt.scatter(x_data, y_data, s=30, marker = "v")
x_function = np.linspace(0, 1.5, 1000) 
yfunction = c*np.power(x_function,m) # + d
plt.plot(x_function, yfunction, '-')
plt.show()
print "residuals python:",((y_data - func(x_data, *coeff))**2).sum()
#compare to excel, first data set
print "residuals excel:",((y_data - func(x_data, -0.425,7.027))**2).sum()
#compare to excel, second data set
print "residuals excel:",((y_data - func(x_data, -0.841,1.0823))**2).sum()

Taking your second dataset as an example: If you plot the raw data, a difficulty with the data becomes obvious: your data are very non-uniform. 以您的第二个数据集为例:如果绘制原始数据,则数据的困难显而易见:您的数据非常不均匀。 Now, since your function has a pure power law form, it's easiest to do the fitting in log scale: 现在,由于您的函数具有纯幂律形式,因此最简单地进行对数刻度拟合:

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: plt.ion()

In [4]: data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

In [5]: data = np.asarray(data)   # just for convenience

In [6]: data.shape
Out[6]: (10, 2)

In [7]: x, y = data[:, 0], data[:, 1]

In [8]: lx, ly = np.log(x), np.log(y)

In [9]: plt.plot(lx, ly, 'ro')
Out[9]: [<matplotlib.lines.Line2D at 0x323a250>]

In [10]: def lfunc(x, a, b):
   ....:     return a*x + b
   ....: 

In [11]: from scipy.optimize import curve_fit

In [12]: opt, cov = curve_fit(lfunc, lx, ly)

In [13]: opt
Out[13]: array([-0.84071518,  0.07906558])

In [14]: plt.plot(lx, lfunc(lx, *opt), 'b-')
Out[14]: [<matplotlib.lines.Line2D at 0x3be0f90>]

Whether this is an adequate model for the data is a separate concern. 这是否是适当的数据模型是一个单独的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM