[英]numpy.polyfit versus scipy.odr
I have a data set which in theory is described by a polynomial of the second degree. 我有一个理论上用二次多项式描述的数据集。 I would like to fit this data and I have used
numpy.polyfit
to do this. 我想适应这些数据,我使用
numpy.polyfit
来做到这一点。 However, the down side is that the error on the returned coefficients is not available. 但是,缺点是返回系数的误差不可用。 Therefore I decided to also fit the data using
scipy.odr
. 因此我决定使用
scipy.odr
来拟合数据。 The weird thing was that the coefficients for the polynomial deviated from each other. 奇怪的是,多项式的系数彼此偏离。
I do not understand this and therefore decided to test both fitting routines on a set of data that I produce my self: 我不明白这一点,因此决定在我生成自己的一组数据上测试两个拟合例程:
import numpy
import scipy.odr
import matplotlib.pyplot as plt
x = numpy.arange(-20, 20, 0.1)
y = 1.8 * x**2 -2.1 * x + 0.6 + numpy.random.normal(scale = 100, size = len(x))
#Define function for scipy.odr
def fit_func(p, t):
return p[0] * t**2 + p[1] * t + p[2]
#Fit the data using numpy.polyfit
fit_np = numpy.polyfit(x, y, 2)
#Fit the data using scipy.odr
Model = scipy.odr.Model(fit_func)
Data = scipy.odr.RealData(x, y)
Odr = scipy.odr.ODR(Data, Model, [1.5, -2, 1], maxit = 10000)
output = Odr.run()
#output.pprint()
beta = output.beta
betastd = output.sd_beta
print "poly", fit_np
print "ODR", beta
plt.plot(x, y, "bo")
plt.plot(x, numpy.polyval(fit_np, x), "r--", lw = 2)
plt.plot(x, fit_func(beta, x), "g--", lw = 2)
plt.tight_layout()
plt.show()
An example of an outcome is as follows: 结果的一个例子如下:
poly [ 1.77992643 -2.42753714 3.86331152]
ODR [ 3.8161735 -23.08952492 -146.76214989]
In the included image, the solution of numpy.polyfit
(red dashed line) corresponds pretty well. 在包含的图像中,
numpy.polyfit
(红色虚线)的解决方案很好地对应。 The solution of scipy.odr
(green dashed line) is basically completely off. scipy.odr
(绿色虚线)的解决方案基本上完全关闭。 I do have to note that the difference between numpy.polyfit
and scipy.odr
was less in the actual data set I wanted to fit. 我必须注意
numpy.polyfit
和scipy.odr
之间的差异在我想要的实际数据集中较少。 However, I do not understand where the difference between the two comes from, why in my own testing example the difference is extremely big, and which fitting routine is better? 但是,我不明白两者之间的差异来自哪里,为什么在我自己的测试例子中差异非常大,哪种拟合程序更好?
I hope you can provide answers that might help me give a better understanding between the two fitting routines and in the process provide answers to the questions I have. 我希望你能提供答案,这些答案可以帮助我更好地理解两个适合的例程,并在此过程中为我提出的问题提供答案。
In the way you are using ODR it does a full orthogonal distance regression. 在您使用ODR的方式中,它执行完全正交距离回归。 To have it do a normal nonlinear least squares fit add
让它做一个正常的非线性最小二乘拟合
Odr.set_job(fit_type=2)
before starting the optimization and you will get what you expected. 在开始优化之前,您将获得预期的结果。
The reason that the full ODR fails so badly is due to not specifying weights/standard deviations. 完整ODR失败的原因是由于未指定权重/标准偏差。 Obviously it does hard to interpret that point cloud and assumes equal wheights for x and y.
显然,很难解释那个点云,并假设x和y的平等轮数。 If you provide estimated standard deviations, odr will yield a good (though different of course) result, too.
如果您提供估计的标准偏差,odr也会产生良好的(当然不同的)结果。
Data = scipy.odr.RealData(x, y, sx=0.1, sy=10)
The actual problem is that the odr output has the beta coefficients in the opposite order than numpy.polyfit has. 实际问题是odr输出的β系数与numpy.polyfit相反。 So the green curve is not calculated correctly.
因此绿色曲线计算不正确。 To plot it, use instead
要绘制它,请改为使用
plt.plot(x, fit_func(beta[::-1], x), "g--", lw = 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.