scipy.curve_fit与numpy.polyfit不同的协方差矩阵

Question

我使用Python 3.6进行数据拟合。 最近，我遇到了以下问题而且我缺乏经验，因此我不确定如何处理这个问题。

如果我在同一组数据点上使用numpy.polyfit（x，y，1，cov = True）和scipy.curve_fit（lambda：x，a，b：a * x + b，x，y），我得到系数a和b几乎相同。 但是scipy.curve_fit的协方差矩阵的值大约是numpy.polyfit值的一半。

由于我想使用协方差矩阵的对角线来估计系数的不确定性（u = numpy.sqrt（numpy.diag（cov））），我有三个问题：

哪个协方差矩阵是正确的（我应该使用哪一个）？
为什么会有区别？
它需要什么来使它们平等？

谢谢！

编辑：

import numpy as np
import scipy.optimize as sc

data = np.array([[1,2,3,4,5,6,7],[1.1,1.9,3.2,4.3,4.8,6.0,7.3]]).T

x=data[:,0]
y=data[:,1]

A=np.polyfit(x,y,1, cov=True)
print('Polyfit:', np.diag(A[1]))

B=sc.curve_fit(lambda x,a,b: a*x+b, x, y)
print('Curve_Fit:', np.diag(B[1]))

如果我使用statsmodels.api ，结果对应于statsmodels.api的结果。

Answer 1

我想它与此有关

593          # Some literature ignores the extra -2.0 factor in the denominator, but 
594          #  it is included here because the covariance of Multivariate Student-T 
595          #  (which is implied by a Bayesian uncertainty analysis) includes it. 
596          #  Plus, it gives a slightly more conservative estimate of uncertainty. 
597          if len(x) <= order + 2: 
598              raise ValueError("the number of data points must exceed order + 2 " 
599                               "for Bayesian estimate the covariance matrix") 
600          fac = resids / (len(x) - order - 2.0) 
601          if y.ndim == 1: 
602              return c, Vbase * fac 
603          else: 
604              return c, Vbase[:,:, NX.newaxis] * fac

在这种情况下， len(x) - order是4而(len(x) - order - 2.0)是2，这可以解释为什么你的值相差2倍。

这解释了问题2.问题3的答案可能是“获得更多数据。”，对于较大的len(x) ，差异可能可以忽略不计。

哪个公式是正确的 （问题1）可能是Cross Validated的一个问题，但我认为它是curve_fit ，因为它明确用于计算您所说的不确定性。 从文档中

pcov：2d数组

估计的popt协方差。 对角线提供参数估计的方差。 要计算参数上的一个标准偏差，请使用perr = np.sqrt（np.diag（pcov））。

虽然上面的polyfit代码中的注释表明其对于Student-T分析的意图更多。

Answer 2

这两种方法以不同的方式计算协方差。 我不完全确定polyfit使用的方法，但是curve_fit通过反转JTdot（J） curve_fit估计协方差矩阵，其中J是模型的雅可比。 通过查看polyfit的代码，似乎它们反转了lhs.T.dot（lhs），其中lhs被定义为Vandermonde矩阵，尽管我不得不承认我不知道第二种方法的数学背景。

现在，关于哪个是正确的问题， polyfit的代码有以下注释：

# Some literature ignores the extra -2.0 factor in the denominator, but
#  it is included here because the covariance of Multivariate Student-T
#  (which is implied by a Bayesian uncertainty analysis) includes it.
#  Plus, it gives a slightly more conservative estimate of uncertainty.

基于此，以及您的观察，似乎polyfit总是给出比curve_fit更大的估计。 这将产生，因为JTdot(J)是协方差矩阵的一阶近似。 因此，如果有疑问，过高估计错误总是更好。

但是，如果您知道数据中的测量误差，我建议也提供它们，并使用absolute_sigma=True调用curve_fit 。 根据我自己的测试，这样做确实符合人们所期望的分析结果，所以我很想知道在提供测量误差时哪两个表现更好。

scipy.curve_fit与numpy.polyfit不同的协方差矩阵

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-08-23 09:35:29

解决方案2
1 2018-08-23 09:40:51

scipy.curve_fit与numpy.polyfit不同的协方差矩阵

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-08-23 09:35:29

解决方案2 1 2018-08-23 09:40:51

解决方案1
2 已采纳 2018-08-23 09:35:29

解决方案2
1 2018-08-23 09:40:51