scipy.curve_fit與numpy.polyfit不同的協方差矩陣

Question

我使用Python 3.6進行數據擬合。 最近，我遇到了以下問題而且我缺乏經驗，因此我不確定如何處理這個問題。

如果我在同一組數據點上使用numpy.polyfit（x，y，1，cov = True）和scipy.curve_fit（lambda：x，a，b：a * x + b，x，y），我得到系數a和b幾乎相同。 但是scipy.curve_fit的協方差矩陣的值大約是numpy.polyfit值的一半。

由於我想使用協方差矩陣的對角線來估計系數的不確定性（u = numpy.sqrt（numpy.diag（cov））），我有三個問題：

哪個協方差矩陣是正確的（我應該使用哪一個）？
為什么會有區別？
它需要什么來使它們平等？

謝謝！

編輯：

import numpy as np
import scipy.optimize as sc

data = np.array([[1,2,3,4,5,6,7],[1.1,1.9,3.2,4.3,4.8,6.0,7.3]]).T

x=data[:,0]
y=data[:,1]

A=np.polyfit(x,y,1, cov=True)
print('Polyfit:', np.diag(A[1]))

B=sc.curve_fit(lambda x,a,b: a*x+b, x, y)
print('Curve_Fit:', np.diag(B[1]))

如果我使用statsmodels.api ，結果對應於statsmodels.api的結果。

Answer 1

我想它與此有關

593          # Some literature ignores the extra -2.0 factor in the denominator, but 
594          #  it is included here because the covariance of Multivariate Student-T 
595          #  (which is implied by a Bayesian uncertainty analysis) includes it. 
596          #  Plus, it gives a slightly more conservative estimate of uncertainty. 
597          if len(x) <= order + 2: 
598              raise ValueError("the number of data points must exceed order + 2 " 
599                               "for Bayesian estimate the covariance matrix") 
600          fac = resids / (len(x) - order - 2.0) 
601          if y.ndim == 1: 
602              return c, Vbase * fac 
603          else: 
604              return c, Vbase[:,:, NX.newaxis] * fac

在這種情況下， len(x) - order是4而(len(x) - order - 2.0)是2，這可以解釋為什么你的值相差2倍。

這解釋了問題2.問題3的答案可能是“獲得更多數據。”，對於較大的len(x) ，差異可能可以忽略不計。

哪個公式是正確的 （問題1）可能是Cross Validated的一個問題，但我認為它是curve_fit ，因為它明確用於計算您所說的不確定性。 從文檔中

pcov：2d數組

估計的popt協方差。 對角線提供參數估計的方差。 要計算參數上的一個標准偏差，請使用perr = np.sqrt（np.diag（pcov））。

雖然上面的polyfit代碼中的注釋表明其對於Student-T分析的意圖更多。

Answer 2

這兩種方法以不同的方式計算協方差。 我不完全確定polyfit使用的方法，但是curve_fit通過反轉JTdot（J） curve_fit估計協方差矩陣，其中J是模型的雅可比。 通過查看polyfit的代碼，似乎它們反轉了lhs.T.dot（lhs），其中lhs被定義為Vandermonde矩陣，盡管我不得不承認我不知道第二種方法的數學背景。

現在，關於哪個是正確的問題， polyfit的代碼有以下注釋：

# Some literature ignores the extra -2.0 factor in the denominator, but
#  it is included here because the covariance of Multivariate Student-T
#  (which is implied by a Bayesian uncertainty analysis) includes it.
#  Plus, it gives a slightly more conservative estimate of uncertainty.

基於此，以及您的觀察，似乎polyfit總是給出比curve_fit更大的估計。 這將產生，因為JTdot(J)是協方差矩陣的一階近似。 因此，如果有疑問，過高估計錯誤總是更好。

但是，如果您知道數據中的測量誤差，我建議也提供它們，並使用absolute_sigma=True調用curve_fit 。 根據我自己的測試，這樣做確實符合人們所期望的分析結果，所以我很想知道在提供測量誤差時哪兩個表現更好。

scipy.curve_fit與numpy.polyfit不同的協方差矩陣

問題描述

2 個解決方案

解決方案1
2 已采納 2018-08-23 09:35:29

解決方案2
1 2018-08-23 09:40:51

scipy.curve_fit與numpy.polyfit不同的協方差矩陣

問題描述

2 個解決方案

解決方案1 2 已采納 2018-08-23 09:35:29

解決方案2 1 2018-08-23 09:40:51

解決方案1
2 已采納 2018-08-23 09:35:29

解決方案2
1 2018-08-23 09:40:51