简体   繁体   中英

Global fitting using scipy.curve_fit

I had a quick question regarding global fitting using scipy.optimize.curve_fit . From my understanding, the only difference in setting up the script between local fitting versus global fitting, is the difference in concatenating your functions. Take the script below for example:

input_data = [protein, ligand]
titration_data=input('Load titration data')

def fun(_, kd):
    a = protein
    b = protein + ligand
    c = ligand
    return np.array((b + kd - np.sqrt(((b + kd)**2) - 4*a*c))/(2*a))

kD=[]
for values in titration_data:
    intensity=[values]
    intensity_array=np.array(intensity)
    x = ligand
    y = intensity_array.flatten()
    popt, pcov = curve_fit(fun, x, y)

Input data is a 6x2 matrix, and titration data is a 8x6 matrix as well. Each row of titration data will be fit to the model individually, and a kd value will be obtained. This is a local fit, now I want to change it to a global fit. I have attempted the script below based on my understanding of what a global fit is:

input_data = [protein, ligand]
titration_data=input('Load titration data')

glob=[]
for values in titration_data:
    def fun(_, kd):
        a = protein
        b = protein + ligand
        c = ligand
        return np.array((b + kd - np.sqrt(((b + kd)**2) - 4*a*c))/(2*a))
        print (fun)
    glob.append(fun)

def glob_fun(_,kd):
  return np.array(glob).flatten()

x = ligand
y = titration_data
popt, pcov = curve_fit(glob_fun, x, y)

From my understanding, this should give me a singular kd output now, from fitting all of the data simultameously. However, I have come across an error message trying to implement this:

popt, pcov = curve_fit(glob_fun, x, y)
return func(xdata, *params) - ydata
TypeError: unsupported operand type(s) for -: 'function' and 'float'

The issue here is glob_fun is actually an array of functions (which, from my understanding, for global fitting it should be). However, it seems rather than use the output of that function (based on whatever it chose for kD), to minimize it to ydata, it's using one of functions from the array itself. Hence the error you cannot subtract a function (or at least, this is my understanding of the error).

Edit: I have added the data so the error and functions are reproducible.

import numpy as np
from scipy.optimize import curve_fit

concentration= np.array([[0.6 , 0.59642147, 0.5859375 , 0.56603774, 0.53003534,0.41899441],
[0.06 , 0.11928429, 0.29296875, 0.62264151, 1.21908127,3.05865922]])
protein = concentration[0,:]
ligand = concentration[1,:]

input_data = [protein, ligand]
titration_data=np.array([[0, 0, 0.29888413, 0.45540198, 0.72436899,1],
 [0,0,0.11930228, 0.35815982, 0.59396978, 1],
 [0,0,0.30214337, 0.46685577, 0.79007708, 1],
 [0,0,0.27204954, 0.56702549, 0.84013344, 1],
 [0,0,0.266836,   0.43993175, 0.74044123, 1],
 [0,0,0.28179148, 0.42406587, 0.77048624, 1],
 [0,0,0.2281092,  0.50336244, 0.79089151, 0.87029517],
 [0,0,0.18317694, 0.55478412, 0.78448465, 1]]).flatten()

glob=[]
for values in titration_data:
    def fun(_, kd):
        a = protein
        b = protein + ligand
        c = ligand
        return np.array((b + kd - np.sqrt(((b + kd)**2) - 4*a*c))/(2*a))
        print (fun)
    glob.append(fun)

def glob_fun(_,kd):
  return np.array(glob).flatten()

x = ligand
y = titration_data
popt, pcov = curve_fit(glob_fun, x, y)

You have successfully performed fits to single datasets. Now, you want to perform a global fit of the same function to multiple datasets, simultaneously. The datasets are in a multidimensional array, where each dataset from the previously performed, successful single fits run along the inner axis. However, scipy.optimize.curve_fit expects

a length M array

for its argument ydata . As far as I understand, this means you won't be able to use [[0], [1]] , for example:

>>> from scipy.optimize import curve_fit
>>> curve_fit(lambda x, a: x, [[0], [1]], [[0], [1]])
ValueError: object too deep for desired array
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 744, in curve_fit
    res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 394, in leastsq
    gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.

As you've already found out, a solution is to flatten the array, so each dataset from each single fit is stringed together, one after another. I think, this is not really called "global fitting" anymore, but "concatenated fitting".

I have composed the following minimal example to show how you could do this with curve_fit :

  • First, we're creating some example data x of shape (m,) and y of shape (n, m) with random noise. (The example data is being printed, if you want to take a look at it.)
  • Then, each line y_i in y is being fitting locally, using a function f . (This is not necessary for the global fit, but nice to see the resulting lines in the plot for comparison.)
  • Finally, the global fit for the whole y : instead of f , we'll have to use a function lambda x, a, b: np.tile(f(x, a, b), len(y)) which applies f to x and repeats the results len(y) times (since there are n or len(y) lines in y to fit to, one for each dataset) by using np.tile . Subseqently, the same a and b are used for each line in y and we get a global fit. (In contrast to the individual a and b for each one of the single fits to each dataset.)
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

m = 5
n = 3
x = np.arange(m)
y = np.array([x + np.random.normal(0, 0.2, len(x)) for _ in range(n)])
print("x =", x)
print("y =", y)

def f(x, a, b):
    return a * x + b

# single fits to each dataset
for y_i in y:
    popt, pcov = curve_fit(f, x, y_i)
    plt.plot(x, y_i, linestyle="", marker="x")
    plt.plot(x, f(x, *popt), color=plt.gca().lines[-1].get_color())

# global fit to concatenated dataset
popt, pcov = curve_fit(lambda x, a, b: np.tile(f(x, a, b), len(y)), x, y.ravel())
plt.plot(x, f(x, *popt), linestyle="--", color="black")

plt.show()

Which results for example in:

x = [0 1 2 3 4]
y = [[ 0.17209542  1.02497865  1.84162787  3.0763016   3.76940871]
 [-0.05657471  0.96686915  2.20283785  3.09199915  3.78047165]
 [-0.53504594  1.21865205  2.35021432  3.02407509  4.22551247]]

图1

The marked points are the input data y , the colored lines are the single fits to those points (of the same color) and the dashed black line is the global fit to all points combined.

Applying this example to your code should give something like this:

import numpy as np
from scipy.optimize import curve_fit

concentration = np.array(
    [
        [0.6, 0.59642147, 0.5859375, 0.56603774, 0.53003534, 0.41899441],
        [0.06, 0.11928429, 0.29296875, 0.62264151, 1.21908127, 3.05865922],
    ]
)

protein = concentration[0, :]
ligand = concentration[1, :]

titration_data = np.array(
    [
        [0, 0, 0.29888413, 0.45540198, 0.72436899, 1],
        [0, 0, 0.11930228, 0.35815982, 0.59396978, 1],
        [0, 0, 0.30214337, 0.46685577, 0.79007708, 1],
        [0, 0, 0.27204954, 0.56702549, 0.84013344, 1],
        [0, 0, 0.266836, 0.43993175, 0.74044123, 1],
        [0, 0, 0.28179148, 0.42406587, 0.77048624, 1],
        [0, 0, 0.2281092, 0.50336244, 0.79089151, 0.87029517],
        [0, 0, 0.18317694, 0.55478412, 0.78448465, 1],
    ]
)

def fun(_, kd):
    a = protein
    b = protein + ligand
    c = ligand
    return np.array((b + kd - np.sqrt(((b + kd) ** 2) - 4 * a * c)) / (2 * a))

def glob_fun(_, kd):
    return np.tile(fun(_, kd), len(titration_data))

x = ligand
y = titration_data
popt, pcov = curve_fit(glob_fun, x, y.ravel())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM