Problems with curve_fit from scipy.optimze

Question

I know that there are some similar questions, but since none of them brought me any further, I decided to ask one of my own. I am sorry, if the answer to my problem is already somewhere out there, but I really couldn't find it.

I tried fitting f(x) = a*x**b to rather linear data using curve_fit. It compiles properly, but the result is way off as shown below:

The thing is, that I don't really know what I am doing, but on the other hand fitting always is more of an art than science and there was at least one general bug with scipy.optimize .

My data looks like this:

x-values:

[16.8, 2.97, 0.157, 0.0394, 14.000000000000002, 8.03, 0.378, 0.192, 0.0428, 0.029799999999999997, 0.000781, 0.0007890000000000001]

y-values:

[14561.766666666666, 7154.7950000000001, 661.53750000000002, 104.51446666666668, 40307.949999999997, 15993.933333333332, 1798.1166666666666, 1015.0476666666667, 194.93800000000002, 136.82833333333332, 9.9531566666666684, 12.073133333333333]

That's my code (using a really nice example in the last answer to that question ):

def func(x,p0,p1): # HERE WE DEFINE A FUNCTION THAT WE THINK WILL FOLLOW THE DATA DISTRIBUTION
    return p0*(x**p1)

# Here you give the initial parameters for p0 which Python then iterates over to find the best fit
popt, pcov = curve_fit(func,xvalues,yvalues, p0=(1.0,1.0))#p0=(3107,0.944)) #THESE PARAMETERS ARE USER DEFINED

print(popt) # This contains your two best fit parameters

# Performing sum of squares
p0 = popt[0]
p1 = popt[1]
residuals = yvalues - func(xvalues,p0,p1)
fres = sum(residuals**2)

print 'chi-square'
print(fres) #THIS IS YOUR CHI-SQUARE VALUE!

xaxis = np.linspace(5e-4,20) # we can plot with xdata, but fit will not look good 
curve_y = func(xaxis,p0,p1)

The starting values are from a fit with gnuplot, that is plausible but I need to cross-check.

This is printed output (first fitted p0, p1, then chi-square):

[  4.67885857e+03   6.24149549e-01]
chi-square
424707043.407

I guess this is a difficult question, therefore much thanks in advance!

Answer 1

When fitting curve_fit optimizes the sum of (data - model)^2 / (error)^2

If you don't pass in errors (as you are doing here) curve_fit assumes that all of the points have an error of 1.

In this case, as your data spans many orders of magnitude, the points with the largest y values dominate the objective function, and causes curve_fit to attempt to fit them at the expense of the others.

The best way of fixing this would be including the errors in your yvalues in the fit (it looks like you do as you have error bars in the plot you have made!). You can do this by passing them in as the sigma parameter of curve_fit .

Answer 2

I would rethink the experimental part. Two datapoints are questionable:

The image you showed us looks pretty good because you took the log:

You could do a linear fit on log(x) and log(y). In this way you might limit the impact of the largest residuals. Another approach would be robust regression (RANSAC from sklearn or least_squares from scipy).

Nevertheless you should either gather more datapoints or repeat the measurements.

Problems with curve_fit from scipy.optimze

Question

2 answers

solution1
1 2016-04-26 09:39:26

solution2
1 2016-04-26 09:54:02

Problems with curve_fit from scipy.optimze

Question

2 answers

solution1 1 2016-04-26 09:39:26

solution2 1 2016-04-26 09:54:02

solution1
1 2016-04-26 09:39:26

solution2
1 2016-04-26 09:54:02