Can I use scipy.curve fit in python when one of the fitted parameters changes the xdata input array values?

Question

This is my first time posting a question and I'm going to try to make it as clear as I can but feel free to ask questions.

I'm trying to fit a model to a curve using the scipy.curve_fit method as below:

import numpy as np
import matplotlib.pyplot as pyplot
import scipy
from scipy.optimize import curve_fit




def func2(x,EM):
    return (((4.0*EM*(np.sqrt(8*10**-9)))/(3.0*(1.0-(0.5**2))*8*10**-9))*(((((x))*1*10**-9)**((3.0/2.0)))))


ydata=[-0.003428768, -0.009050058, -0.0037997673999999996, -0.0003833233, -0.007557649, -0.0034860994, -0.0009856887, -0.0017508664, -0.00036931394999999996, 
       -0.0040713947, -0.005737315000000001, 0.0005120568, -0.007336486, -0.00719302, -0.0039941817, -0.0029785274, -0.0013044578, -0.008190335, -0.00833507,
       -0.0074282060000000006, -0.009629990000000001, -0.009425125, -0.008662485999999999, -0.0019445216, -0.008331748, -0.009513038, -0.0047609017, -0.004364422,
       -0.010325097, -0.0036570733, -0.0060091914, -0.005655772, -0.0045517069999999995, -0.00066998035, 0.006374902, 0.006445733, 0.0019101816,
       0.010262737999999999, 0.011139007, 0.018161469, 0.016963122, 0.022915895, 0.027177791, 0.028707139, 0.040105638, 0.044088004, 0.041657403,
       0.052325636999999994, 0.062399405, 0.07020844, 0.076979915, 0.08888523, 0.099634745, 0.10961602, 0.12188646, 0.13677225, 0.15639512, 0.16833586,
       0.18849944000000002, 0.21515548, 0.23989769000000002, 0.26319308, 0.29388397, 0.321042, 0.35637776, 0.38564656999999997, 0.4185209, 0.44986692,
       0.48931552999999994, 0.52583893, 0.5626885, 0.6051665, 0.6461075, 0.69644346, 0.7447817, 0.7931281, 0.8381386000000001, 0.8883482, 0.9395609999999999,
       0.9853629, 1.0377034, 1.0889026, 1.1334094]


xdata=[34.51388, 33.963736999999995, 
       33.510695, 33.04127, 32.477253, 32.013624, 31.536019999999997, 31.02925, 30.541649999999997, 
       30.008646, 29.493828, 29.049707, 28.479668, 27.980956, 27.509590000000003, 27.018721, 26.533737, 25.972296, 
       25.471065, 24.979228000000003, 24.459624, 23.961517, 23.46839, 23.028454, 22.471411, 21.960924, 21.503428000000003, 
       21.007033, 20.453855, 20.013475, 19.492528, 18.995746999999998, 18.505670000000002, 18.040403, 17.603387, 17.104082, 
       16.563634, 16.138298000000002, 15.646187, 15.20897, 14.69833, 14.25156, 13.789688, 13.303409, 12.905278, 12.440909, 11.919262, 
       11.514609, 11.104646, 10.674512, 10.235055, 9.84145, 9.437523, 9.026733, 8.63639, 8.2694065, 7.944733, 7.551445, 7.231599999999999, 
       6.9697434, 6.690793299999999, 6.3989780000000005, 6.173159, 5.9157856, 5.731453, 5.4929328, 5.2866156, 5.066648000000001, 4.9190496, 
       4.745381399999999, 4.574569599999999, 4.4540283, 4.3197597000000005, 4.2694026, 4.2012034, 4.133134, 4.035212, 3.9837262, 3.9412007, 3.8503475999999996, 
       3.8178950000000005, 3.7753053999999997, 3.6728842]


dstart=20.0 

xdata=np.array(xdata[::-1])
xdata=xdata-dstart
xdata=list(xdata)

xdata1=[]
ydata1=[]
for i in range(len(xdata)):
    if xdata[i]>0:
        xdata1.append(xdata[i])
        ydata1.append(ydata[i])

xdata=np.array(xdata1)
ydata=np.array(ydata1)

popt, pcov = curve_fit(func2, xdata, ydata)
a=popt[0]

print "E=", popt[0]/10**6


t=func2(xdata,a)

ax=pyplot.figure().add_subplot(1,1,1)
ax.plot(xdata,t, color="blue",mew=2.0,label="Hertz Fit")
ax.plot(xdata,ydata,ls="",marker="x",color="red",mew=2.0,label="Data")
ax.legend(loc=2)
pyplot.show()

The "dstart" value basically cuts off the lower portion of the code I don't want to fit because it is negative and the model doesn't like negative numbers. Currently I have to manually set "dstart" before running the code and then I see the final result.

I started by doing this fitting in Excel with Solver to vary both the "EM" variable and the "dstart" variable simultaneously by nesting the code which adjusts the xdata by "dstart" and cuts off the negative values into the function being fit.

Essentially what I want is:

import numpy as np
import matplotlib.pyplot as pyplot
import scipy
from scipy.optimize import curve_fit




def func2(x,EM,dstart): 

    xdata=np.array(x[::-1])
    xdata=dstart-xdata
    xdata=list(xdata)

    xdata1=[]
    for i in range(len(xdata)):
        if xdata[i]>0:
            xdata1.append(xdata[i])

    global xdata2
    xdata2=np.array(xdata1)






    return (((4.0*EM*(np.sqrt(8*10**-9)))/(3.0*(1.0-(0.5**2))*8*10**-9))*(((((xdata2))*1*10**-9)**((3.0/2.0)))))


ydata=[-0.003428768, -0.009050058, -0.0037997673999999996, -0.0003833233, -0.007557649, -0.0034860994, -0.0009856887, -0.0017508664, -0.00036931394999999996, 
       -0.0040713947, -0.005737315000000001, 0.0005120568, -0.007336486, -0.00719302, -0.0039941817, -0.0029785274, -0.0013044578, -0.008190335, -0.00833507,
       -0.0074282060000000006, -0.009629990000000001, -0.009425125, -0.008662485999999999, -0.0019445216, -0.008331748, -0.009513038, -0.0047609017, -0.004364422,
       -0.010325097, -0.0036570733, -0.0060091914, -0.005655772, -0.0045517069999999995, -0.00066998035, 0.006374902, 0.006445733, 0.0019101816,
       0.010262737999999999, 0.011139007, 0.018161469, 0.016963122, 0.022915895, 0.027177791, 0.028707139, 0.040105638, 0.044088004, 0.041657403,
       0.052325636999999994, 0.062399405, 0.07020844, 0.076979915, 0.08888523, 0.099634745, 0.10961602, 0.12188646, 0.13677225, 0.15639512, 0.16833586,
       0.18849944000000002, 0.21515548, 0.23989769000000002, 0.26319308, 0.29388397, 0.321042, 0.35637776, 0.38564656999999997, 0.4185209, 0.44986692,
       0.48931552999999994, 0.52583893, 0.5626885, 0.6051665, 0.6461075, 0.69644346, 0.7447817, 0.7931281, 0.8381386000000001, 0.8883482, 0.9395609999999999,
       0.9853629, 1.0377034, 1.0889026, 1.1334094]


xdata=[34.51388, 33.963736999999995, 
       33.510695, 33.04127, 32.477253, 32.013624, 31.536019999999997, 31.02925, 30.541649999999997, 
       30.008646, 29.493828, 29.049707, 28.479668, 27.980956, 27.509590000000003, 27.018721, 26.533737, 25.972296, 
       25.471065, 24.979228000000003, 24.459624, 23.961517, 23.46839, 23.028454, 22.471411, 21.960924, 21.503428000000003, 
       21.007033, 20.453855, 20.013475, 19.492528, 18.995746999999998, 18.505670000000002, 18.040403, 17.603387, 17.104082, 
       16.563634, 16.138298000000002, 15.646187, 15.20897, 14.69833, 14.25156, 13.789688, 13.303409, 12.905278, 12.440909, 11.919262, 
       11.514609, 11.104646, 10.674512, 10.235055, 9.84145, 9.437523, 9.026733, 8.63639, 8.2694065, 7.944733, 7.551445, 7.231599999999999, 
       6.9697434, 6.690793299999999, 6.3989780000000005, 6.173159, 5.9157856, 5.731453, 5.4929328, 5.2866156, 5.066648000000001, 4.9190496, 
       4.745381399999999, 4.574569599999999, 4.4540283, 4.3197597000000005, 4.2694026, 4.2012034, 4.133134, 4.035212, 3.9837262, 3.9412007, 3.8503475999999996, 
       3.8178950000000005, 3.7753053999999997, 3.6728842]

xdata2=list(xdata2)
ydata1=[]
for i in range(len(xdata2)):
    if xdata2[i]>0:
        ydata1.append(ydata[i])




popt, pcov = curve_fit(func2, xdata, ydata)

But this doesn't work as I get a value error "ValueError: operands could not be broadcast together with shapes (28,) (30,)". I think what I need is for the the curve_fit to bring in the xdata, adjust by the first guessed "dstart", guess EM and check for fit and minimized error, try new "dstart" to adjust xdata, guess EM and check for fit and minimized error, so on and so forth. As I'm still fairly new to Python I'm definitely out of my element with the curve fit and I would just use Excel if I didn't have potentially thousands of curves to run.

Any help would be appreciated!

Answer 1

I'll split this in two: conceptual and coding related

Conceptual:

Let's start by rephrasing your question. As it stands the answer is: Yes, obviously. Simply absorb the parameter-dependent change of x in the target function. But that won't solve your problem. What you really seem to be interested in is what to do with parameters for which some of the x cannot be processed by your function. There is no one-size-fits-all for that.

You could choose to deem such parameters as unacceptable in which case you'd have to resort to constrained optimisation. There are a few solvers in scipy that can do that.

You could choose to remove the difficult points from the data set before fitting.

You could introduce soft constraints and penalise bad values instead of ruling them out completely.

Programming style:

for loops in numerical programs. There are gazillions of posts on that on this site, so I'll only give one example:

xdata2=list(xdata2)
ydata1=[]
for i in range(len(xdata2)):
    if xdata2[i]>0:
        ydata1.append(ydata[i])

can be written in one line that will execute much faster and return an array instead of a list :

ydata1 = ydata[xdata2 > 0]

look at the numpy tutorial/docs or search this site for "vectorization" if you want to learn this technique.

Apart from that, no complaints.

Why your second program doesn't work.

You are sieving both your x and your y , so they should have the same shape. But then you go on and use an old copy instead of the new y whereas you do use the new x . That's why the shapes don't match

Btw. the way you've set it up (modify x within func2 ) is more or less implementing the absorb strategy I mention earlier. Only, since you have no access to y you cannot change the shape of x .

Can I use scipy.curve fit in python when one of the fitted parameters changes the xdata input array values?

Question

1 answers

solution1
0 2017-02-11 13:13:40

Can I use scipy.curve fit in python when one of the fitted parameters changes the xdata input array values?

Question

1 answers

solution1 0 2017-02-11 13:13:40

solution1
0 2017-02-11 13:13:40