简体   繁体   English

具有正弦函数的数据的Scipy curve_fit失败

[英]Scipy curve_fit fails for data with sine function

I'm trying to fit a curve through some data. 我正在尝试通过一些数据拟合曲线。 The function I'm trying to fit is as follows: 我要适合的功能如下:

def f(x,a,b,c):
    return a +b*x**c

When using scipy.optimize.curve_fit I do not get any results: It returns the (default) initial parameters: 当使用scipy.optimize.curve_fit时,我没有得到任何结果:它返回(默认)初始参数:

(array([ 1.,  1.,  1.]),
 array([[ inf,  inf,  inf],
        [ inf,  inf,  inf],
        [ inf,  inf,  inf]]))

I've tried reproducing the data, and found that a sine function was causing the problem (the data contains daily variation): 我尝试重现数据,发现正弦函数引起了问题(数据包含每日变化):

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

xdata=np.random.rand(1000) + 0.002 *np.sin(np.arange(1000)/(1.5*np.pi))
ydata=0.1 + 23.4*xdata**0.56 + np.random.normal(0,2,1000)

def f(x,a,b,c):
    return a +b*x**c

fit=curve_fit(f,xdata,ydata)

fig,ax=plt.subplots(1,1)
ax.plot(xdata,ydata,'k.',markersize=3)
ax.plot(np.arange(0,1,.01), f(np.arange(0,1,.01),*fit[0]))
fig.show()

I would obviously expect curve_fit to return something close to [0.1, 23.4, .56]. 我显然希望curve_fit返回接近[0.1,23.4,.56]的值。

Note that the sine function does not really seem to affect the data ('xdata') in value, as the first term of xdata ranges between 0 and 1 and I'm adding something between -0.002 and +0.002, but it does cause the fitting procedure to fail. 请注意,正弦函数似乎并未真正影响值中的数据('xdata'),因为xdata的第一项介于0和1之间,而我在-0.002和+0.002之间添加了一些东西,但这确实导致了拟合过程失败。 I found the value 0.002 to be close to the 'critical' value for failure; 我发现值0.002接近失败的“临界”值; if it is smaller the procedure is less likely to fail, and vice versa. 如果较小,则该过程失败的可能性较小,反之亦然。 At 0.002 the procedure fails about as often as not. 在0.002,过程失败的频率几乎没有。

I have tried solving this problem by shuffling the 'xdata' and 'ydata' simultaneously, to no effect. 我试图通过同时改组'xdata'和'ydata'来解决此问题,但没有任何效果。 I thought (for no particular reason) that perhaps removing the autocorrelation of the data would solve the problem. 我认为(无特殊原因)也许删除数据的自相关将解决问题。

So my question is: how can I fix/bypass this problem? 所以我的问题是:如何解决/绕过此问题? I can change the sine contribution in the synthetic data in the snippet above, but for my real data I obviously cannot. 我可以在上面的代码段中更改合成数据中的正弦贡献,但是对于我的真实数据,我显然不能。

You can eliminate the NaNs generated by negative x-values within in the model function: 您可以在模型函数中消除由负x值生成的NaN:

def f(x,a,b,c):
    y = a +b*x**c
    y[np.isnan(y)] = 0.0
    return y

Replacing all NaNs by 0 might not be the best choice. 用0替换所有NaN可能不是最佳选择。 You could try neighbour values or do some kind of extrapolation. 您可以尝试邻居值或进行某种推断。

If you feed in generated test data you have to make sure that there are no NaNs in there either. 如果输入生成的测试数据,则必须确保其中也没有NaN。 So directly after data generation put something like: 因此,在数据生成后直接添加如下内容:

if xdata.min() < 0:
    print 'expecting NaNs'
    ydata[np.isnan(ydata)] = 0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM