简体   繁体   English

numpy.polyfit与scipy.odr

[英]numpy.polyfit versus scipy.odr

I have a data set which in theory is described by a polynomial of the second degree. 我有一个理论上用二次多项式描述的数据集。 I would like to fit this data and I have used numpy.polyfit to do this. 我想适应这些数据,我使用numpy.polyfit来做到这一点。 However, the down side is that the error on the returned coefficients is not available. 但是,缺点是返回系数的误差不可用。 Therefore I decided to also fit the data using scipy.odr . 因此我决定使用scipy.odr来拟合数据。 The weird thing was that the coefficients for the polynomial deviated from each other. 奇怪的是,多项式的系数彼此偏离。

I do not understand this and therefore decided to test both fitting routines on a set of data that I produce my self: 我不明白这一点,因此决定在我生成自己的一组数据上测试两个拟合例程:

import numpy
import scipy.odr
import matplotlib.pyplot as plt

x = numpy.arange(-20, 20, 0.1)
y = 1.8 * x**2 -2.1 * x + 0.6 + numpy.random.normal(scale = 100, size = len(x))

#Define function for scipy.odr
def fit_func(p, t):
  return p[0] * t**2 + p[1] * t + p[2]

#Fit the data using numpy.polyfit
fit_np = numpy.polyfit(x, y, 2)

#Fit the data using scipy.odr
Model = scipy.odr.Model(fit_func)
Data = scipy.odr.RealData(x, y)
Odr = scipy.odr.ODR(Data, Model, [1.5, -2, 1], maxit = 10000)
output = Odr.run()
#output.pprint()
beta = output.beta
betastd = output.sd_beta

print "poly", fit_np
print "ODR", beta

plt.plot(x, y, "bo")
plt.plot(x, numpy.polyval(fit_np, x), "r--", lw = 2)
plt.plot(x, fit_func(beta, x), "g--", lw = 2)

plt.tight_layout()

plt.show()

An example of an outcome is as follows: 结果的一个例子如下:

poly [ 1.77992643 -2.42753714  3.86331152]
ODR [   3.8161735   -23.08952492 -146.76214989]

在此输入图像描述

In the included image, the solution of numpy.polyfit (red dashed line) corresponds pretty well. 在包含的图像中, numpy.polyfit (红色虚线)的解决方案很好地对应。 The solution of scipy.odr (green dashed line) is basically completely off. scipy.odr (绿色虚线)的解决方案基本上完全关闭。 I do have to note that the difference between numpy.polyfit and scipy.odr was less in the actual data set I wanted to fit. 我必须注意numpy.polyfitscipy.odr之间的差异在我想要的实际数据集中较少。 However, I do not understand where the difference between the two comes from, why in my own testing example the difference is extremely big, and which fitting routine is better? 但是,我不明白两者之间的差异来自哪里,为什么在我自己的测试例子中差异非常大,哪种拟合程序更好?

I hope you can provide answers that might help me give a better understanding between the two fitting routines and in the process provide answers to the questions I have. 我希望你能提供答案,这些答案可以帮助我更好地理解两个适合的例程,并在此过程中为我提出的问题提供答案。

In the way you are using ODR it does a full orthogonal distance regression. 在您使用ODR的方式中,它执行完全正交距离回归。 To have it do a normal nonlinear least squares fit add 让它做一个正常的非线性最小二乘拟合

Odr.set_job(fit_type=2)

before starting the optimization and you will get what you expected. 在开始优化之前,您将获得预期的结果。

合适的结果

The reason that the full ODR fails so badly is due to not specifying weights/standard deviations. 完整ODR失败的原因是由于未指定权重/标准偏差。 Obviously it does hard to interpret that point cloud and assumes equal wheights for x and y. 显然,很难解释那个点云,并假设x和y的平等轮数。 If you provide estimated standard deviations, odr will yield a good (though different of course) result, too. 如果您提供估计的标准偏差,odr也会产生良好的(当然不同的)结果。

Data = scipy.odr.RealData(x, y, sx=0.1, sy=10)

The actual problem is that the odr output has the beta coefficients in the opposite order than numpy.polyfit has. 实际问题是odr输出的β系数与numpy.polyfit相反。 So the green curve is not calculated correctly. 因此绿色曲线计算不正确。 To plot it, use instead 要绘制它,请改为使用

plt.plot(x, fit_func(beta[::-1], x), "g--", lw = 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM