简体   繁体   English

SciPy Curve_fit() 不适合曲线

[英]SciPy Curve_fit() doesn't fit curve

I made a random graph, and tried to use SciPy curve_fit to fit the best curve to the plot, but it fails.我制作了一个随机图,并尝试使用 SciPy curve_fit 将最佳曲线拟合到绘图中,但它失败了。

First, I generated a random exponential decay graph, where A, w, T2 are randomly generated using numpy:首先,我生成了一个随机指数衰减图,其中A, w, T2是使用 numpy 随机生成的:

def expDec(t, A, w, T2):
    return A * np.cos(w * t) * (2.718**(-t / T2))

Now I have SciPy guess the best fit curve:现在我让 SciPy 猜测最佳拟合曲线:

t = x['Input'].values
hr = x['Output'].values
c, cov = curve_fit(bpm, t, hr)

Then I plot the curve然后我绘制曲线

for i in range(n):
    y[i] = bpm(x['Input'][i], c[0], c[1], c[2])
plt.plot(x['Input'], x['Output'])
plt.plot(x['Input'], y)

That's it.就是这样。 Here's how bad the fit looks:这是合身看起来有多糟糕:

.

If anyone can help, that would be great.如果有人可以提供帮助,那就太好了。

MWE (Also available interactively here ) MWE(也可在此处交互式获得)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

inputs = []
outputs = []

# THIS GIVES THE DOMAIN
dom = np.linspace(-5, 5, 100)

# FUNCTION & PARAMETERS (RANDOMLY SELECTED)
A = np.random.uniform(3, 6)
w = np.random.uniform(3, 6)
T2 = np.random.uniform(3, 6)
y = A * np.cos(w * dom) * (2.718**(-dom / T2))

# DEFINES EXPONENTIAL DECAY FUNCTION
def expDec(t, A, w, T2):
    return A * np.cos(w * t) * (2.718**(-t / T2))

# SETS UP FIGURE FOR PLOTTING
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

# PLOTS THE FUNCTION
plt.plot(dom, y, 'r')

# SHOW THE PLOT
plt.show()

for i in range(-9, 10): 
    inputs.append(i)
    outputs.append(expDec(i, A, w, T2))
    
# PUT IT DIRECTLY IN A PANDAS DATAFRAME
points = {'Input': inputs, 'Output': outputs}

x = pd.DataFrame(points, columns = ['Input', 'Output'])
   
# FUNCTION WHOSE PARAMETERS PROGRAM SHOULD BE GUESSING
def bpm(t, A, w, T2):
    return A * np.cos(w * t) * (2.718**(-t / T2))

# INPUT & OUTPUTS
t = x['Input'].values
hr = x['Output'].values

# USE SCIPY CURVE FIT TO USE NONLINEAR LEAST SQUARES TO FIND BEST PARAMETERS. TRY 1000 TIMES BEFORE STOPPING.
constants = curve_fit(bpm, t, hr, maxfev=1000)

# GET CONSTANTS FROM CURVE_FIT
A_fit = constants[0][0]
w_fit = constants[0][1]
T2_fit = constants[0][2]

# CREATE ARRAY TO HOLD FITTED OUTPUT
fit = []

# APPEND OUTPUT TO FIT=[] ARRAY
for i in range(-9,10):
    fit.append(bpm(i, A_fit, w_fit, T2_fit))
    
# PLOTS BEST PARAMETERS
plt.plot(x['Input'], x['Output'])
plt.plot(x['Input'], fit, "ro-")

As a first step, I would like to rewrite your MCVE to use vectorized operations and only a single instance of the function computation.作为第一步,我想重写您的 MCVE 以使用矢量化操作和函数计算的单个实例。 This will reduce everything to just a couple of lines.这会将所有内容减少到几行。 I recommend using a seed for repeatability when you do your testing as well:我建议您在进行测试时也使用种子来提高可重复性:

def exp_dec(t, A, w, T2):
    return A * np.cos(w * t) * np.exp(-t / T2)

np.random.seed(42)
A, w, T2 = np.random.uniform(3, 6, size=3)
dom = np.linspace(-9, 9, 1000)

t = np.arange(-9., 10.)
hr = exp_dec(t, A, w, T2)

fit, _ = curve_fit(exp_dec, t, hr)

fig, ax = plt.subplots()
ax.plot(dom, exp_dec(dom, A, w, T2), 'g', label='target')
ax.scatter(t, hr, c='r', label='samples')
ax.plot(dom, exp_dec(dom, *fit), 'b', label='fit')
ax.plot(dom, exp_dec(dom, 1, 1, 1), 'k:', label='start')
ax.legend()

To explain the last plotted item, take a look at the docs for curve_fit .要解释最后绘制的项目,请查看curve_fit的文档。 Notice that there is a parameter p0 , which defaults to all ones if you do not supply it.请注意,有一个参数p0 ,如果您不提供它,则默认为所有参数。 That is the initial guess from which your fit starts to guess values.这是您的拟合开始猜测值的初始猜测。

在此处输入图片说明

Looking at this picture, you can pretty much see what the problem is.看看这张照片,你几乎可以看出问题所在。 The starting guess has a much lower frequency than your data.起始猜测的频率比您的数据低得多。 Because the sampling frequency is so close to the oscillation frequency, the fit hits a local minimum before it is able to increase the frequency sufficiently to get the right function.因为采样频率非常接近振荡频率,所以在能够充分增加频率以获得正确的函数之前,拟合会达到局部最小值。 You can fix this in a couple of different ways.您可以通过几种不同的方式解决此问题。

One way is to give curve_fit a better initial guess.一种方法是给curve_fit一个更好的初始猜测。 If you know bounds on the amplitude, frequency and decay rate, use them.如果您知道幅度、频率和衰减率的界限,请使用它们。 The amplitude will generally be a straightforward linear fit.幅度通常是直接的线性拟合。 The toughest one is usually the frequency, and as you can see here, it is better to over-estimate it.最难的通常是频率,正如您在此处看到的那样,最好高估它。 But if you over-estimate it too much, you might end up with a harmonic of the original data.但是如果你高估它太多,你最终可能会得到原始数据的谐波。

Here are a couple of sample fits that show different local minima in the optimization.这里有几个样本拟合,它们显示了优化中的不同局部最小值。 The second one shows a harmonic case from over-estimating the oscillation frequency:第二个显示了高估振荡频率的谐波情况:

在此处输入图片说明

在此处输入图片说明

A decent set of starting parameters is the upper bound of your random range:一组不错的起始参数是您的随机范围的上限:

fit, _ = curve_fit(exp_dec, t, hr, p0=[6, 6, 6])

在此处输入图片说明

The green curve matches the blue so closely, you can not see it:绿色曲线与蓝色非常接近,你看不到它:

>>> A, w, T2
(4.123620356542087, 5.852142919229749, 5.195981825434215)
>>> tuple(fit)
(4.123620356542086, 5.852142919229749, 5.195981825434215)

Another way to to fix the problem is to sample the data more frequently.解决问题的另一种方法是更频繁地对数据进行采样。 More data will generally mean a lower chance of hitting a false local minimum in the optimization.更多的数据通常意味着在优化中达到错误的局部最小值的机会更低。 However, when dealing with sinusoidal functions, this does not always help because of how the matching works.但是,在处理正弦函数时,由于匹配的工作方式,这并不总是有帮助。 Here is an example with 10x the number of samples (a fit with just 2x and the default guess fails entirely):这是一个示例数量为 10 倍的示例(只有 2 倍的拟合并且默认猜测完全失败):

...
t = np.arange(-9., 10., 0.1)
...

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM