Python分布擬合平方誤差總和（SSE）

Question

我試圖找到適合我的數據的最佳分布曲線，包括

y-axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 
          0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 
          0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

y 軸是事件在 x 軸時間段中發生的概率：

x-axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 
          12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 
          22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 
          32.0, 33.0, 34.0]

我正在 python 中執行此操作，下面是使用 Scipy (Python) 將經驗分布擬合到理論分布上提供的示例？

具體來說，我試圖重新創建名為“具有平方誤差總和 (SSE) 的分布擬合”的部分，您可以在其中運行不同的分布以找到對數據的正確擬合。

我如何修改該示例以使其對我的數據輸入起作用？ 回答

根據 Bill 的響應更新版本，但現在嘗試根據數據繪制擬合曲線並查看某些內容：

%matplotlib inline
import matplotlib.pyplot as plt
import scipy
import scipy.stats
import numpy as np
from scipy.stats import gamma, lognorm, loglaplace
from scipy.optimize import curve_fit

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')

def f(x, a, loc, scale):
    return gamma.pdf(x, a, loc, scale)

result, pcov = curve_fit(f, x_axis, y_axis)

# get curve shape, location, scale
shape = result[:-2]
loc = result[-2]
scale = result[-1]

# construct the curve
x = np.linspace(0, 36, 100)
y = f(x, *result)

plt.bar(x_axis, y_axis, width, alpha=0.75)
plt.plot(x, y, c='g')

Answer 1

您的情況與您引用的問題中處理的情況不同。 您擁有數據點的縱坐標和橫坐標，而不是通常的 iid 樣本。 我建議你使用scipy curve_fit 。 這是一個示例。

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

## y_axis values must be normalised
sum_ys = sum(y_axis)
y_axis = [_/sum_ys for _ in y_axis]
print (sum(y_axis))

from scipy.stats import gamma, norm
from scipy.optimize import curve_fit

def gamma_f(x, a, loc, scale):
    return gamma.pdf(x, a, loc, scale)

def norm_f(x, loc, scale):
    return norm.pdf(x, loc, scale)

fitting = norm_f

result = curve_fit(fitting, x_axis, y_axis)
print (result)

import matplotlib.pyplot as plt

plt.plot(x_axis, y_axis, 'ro')
plt.plot(x_axis, [fitting(_, *result[0]) for _ in x_axis], 'b-')
plt.axis([0,35,0,.5])
plt.show()

此版本展示了如何繪制一幅圖，以便對數據進行正常擬合。 （伽馬提供了一個很差的擬合。）法線只需要兩個參數。 一般來說，您只需要輸出結果的第一部分，即參數、形狀、位置和比例的估計值。

(array([  2.3352639 ,  -3.08105104,  10.15024823]), array([[   5954.86532869,  -27818.92220973,  -19675.22421994],
       [ -27818.92220973,  133161.76500251,   90741.43608615],
       [ -19675.22421994,   90741.43608615,   66054.79087992]]))

請注意，伽馬分布的 pdf 也可以在 scipy 中獲得，我認為您需要的其他人也是如此，從而為您節省了編碼工作。

我在第一個代碼中省略的最重要的事情是需要對 y 值進行歸一化，也就是說，使它們的總和為 1，因為它們應該近似於直方圖的高度。

Answer 2

我使用OpenTURNS平台嘗試了您的示例這是我得到的。

在導入 openturns 和 openturs.viewer.View 進行繪圖后，我開始使用與您相同的數據

    import openturns as ot
    from openturns.viewer import View

    x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 
          12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 
          22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 
          32.0, 33.0, 34.0]

    y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 
          0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 
          0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

第一步：我們可以定義對應的分布

    distribution = ot.UserDefined(ot.Sample([[s] for s in x_axis]), y_axis)
    graph = distribution.drawPDF()
    graph.setColors(["black"])
    graph.setLegends(["your input"])

在這個階段，如果你View(graph)你會得到：

第二步：我們可以從獲得的分布中推導出一個樣本

    sample = distribution.getSample(10000)

此樣本將用於擬合任何類型的分布。 我嘗試使用 WeibullMin 和 Gamma 分布

    # WeibullMin Factory
    distribution2 = ot.WeibullMinFactory().build(sample)
    print(distribution2)
    graph2 = distribution2.drawPDF() ; graph2.setLegends(["Best WeibullMin"])
    >>> WeibullMin(beta = 8.83969, alpha = 1.48142, gamma = 4.76832)

    # Gamma Factory
    distribution3 = ot.GammaFactory().build(sample)
    print(distribution3)
    >>> Gamma(k = 2.08142, lambda = 0.25157, gamma = 4.9995)
    graph3 = distribution3.drawPDF() ; graph3.setLegends(["Best Gamma"]) ; 
    graph3.setColors(["blue"])

    # plotting all the results
    graph.add(graph2) ; graph.add(graph3)
    View(graph)

Answer 3

我認為它是計算平方誤差總和的最好和最簡單的方法：

#編寫函數

def SSE(y_true, y_pred):

     sse= np.sum((y_true-y_pred)**2)

     print(sse)

#現在調用函數並獲取結果
SSE(y_true, y_pred)

Python分布擬合平方誤差總和（SSE）

問題描述

3 個解決方案

解決方案1
0 已采納 2017-04-01 19:28:29

解決方案2
0 2020-10-25 19:36:02

解決方案3
0 2021-06-15 08:10:56

Python分布擬合平方誤差總和（SSE）

問題描述

3 個解決方案

解決方案1 0 已采納 2017-04-01 19:28:29

解決方案2 0 2020-10-25 19:36:02

解決方案3 0 2021-06-15 08:10:56

解決方案1
0 已采納 2017-04-01 19:28:29

解決方案2
0 2020-10-25 19:36:02

解決方案3
0 2021-06-15 08:10:56