简体   繁体   English

scipy.optimize.curve_fit用于复杂依赖变量参数的函数

[英]scipy.optimize.curve_fit for a function with complex dependence on variable parameters

I'm relatively new to using Python to fit my data, so please excuse my lack of programming finesse. 我相对较不熟悉使用Python来拟合数据,因此请原谅我缺乏编程技巧的人。 I have, however, been unable to find a solution to the errors my current curve fitting attempts are throwing. 但是,我一直无法找到解决方案,以解决当前曲线拟合尝试所引发的错误。 I believe these errors are due to my model function's complex dependence on one of the two variable parameters (namely, Kd). 我相信这些错误是由于我的模型函数对两个变量参数之一(即Kd)的复杂依赖性所致。 I would appreciate insight regarding what specifically is causing this problem and how I may adjust my definition or fitting package to avoid it. 我将对导致此问题的原因以及如何调整定义或合适的包装以免出现此问题的见解表示赞赏。 Minimal working example to follow: 遵循的最小工作示例:

# Import libraries
import scipy as scipy
from scipy import stats
import numpy as np
from scipy.optimize import curve_fit

np.set_printoptions(precision=4)
ConcSyringeTotal = 9.5 ## total monomer concentration in the syringe [M]tot, in mM
Vinj = 10 ## volume injected in each injection, in uL
Vinit = 1250 ## volume of solvent initially in the sample cell, in uL
Vcell = 1000 ## cell volume, in uL (only the heat change within this volume is measured)
Injections = np.arange(2.00,26.00,1.00)
print Injections
Energy = np.array([136.953, 105.119, 84.414, 69.373, 60.898, 52.813, 46.187, 39.653, 33.894, 29.975, 27.315, 24.200, 21.643, 19.080, 16.158, 13.454, 13.218, 11.568, 10.742, 9.547, 8.693, 7.334, 6.111, 4.741])
print Energy

def DimerDissociation(injection, Kd, DHd): ## a dimer dissociation model for an ITC dilution experiment
    ## returns the heat flow (y-data, in ucal) per injection (x-data, unitless)
    ## fit for the dissociation constant (Kd, in mM = mmol/L, umol/mL, nmol/uL) 
    ## and the enthalpy of dissociation (DHd, in ucal/nmol = kcal/mol)
    ##
    ## concentration (in mM) of the free monomer in the cell after equilibration of the i-th injection
    VolumeAdded = 6+(injection-1)*Vinj ## in uL
    VolumeTotal = Vinit + VolumeAdded ## in uL
    CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
    ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
    ConcCellMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotal/2]) 
    ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
    ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
    ##
    ## concentration (in mM) of the free monomer in the syringe
    ConcSyringeMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcSyringeTotal/2]) 
    ConcSyringeMonomer_real = ConcSyringeMonomer_roots.real[abs(ConcSyringeMonomer_roots.imag)<1e-5]
    ConcSyringeMonomer_positive = ConcSyringeMonomer_real[ConcSyringeMonomer_real>0]
    ConcSyringeMonomer = ConcSyringeMonomer_positive[ConcSyringeMonomer_positive<ConcSyringeTotal]
    ## nmol of the free monomer injected from the syringe
    SyringeMonomerInjected = Vinj*ConcSyringeMonomer[0]
    ##
    ## concentration (in mM) of the free monomer in the cell before the i-th injection
    VolumeAddedPre = 6+(injection-2)*Vinj
    VolumeTotalPre = Vinit + VolumeAddedPre
    CellTotalPre = ConcSyringeTotal*VolumeAddedPre
    ConcCellTotalPre = CellTotalPre/VolumeTotalPre
    ConcCellMonomerPre_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotalPre/2]) 
    ConcCellMonomerPre_real = ConcCellMonomerPre_roots.real[abs(ConcCellMonomerPre_roots.imag)<1e-5]
    ConcCellMonomerPre_positive = ConcCellMonomerPre_real[ConcCellMonomerPre_real>0]
    ConcCellMonomerPre = ConcCellMonomerPre_positive[ConcCellMonomerPre_positive<ConcCellTotalPre]
    ## nmol of the free monomer in the cell before the i-th injection
    CellMonomerPre = VolumeTotalPre*ConcCellMonomerPre[0]
    ##
    ## concentration of the free monomer before equilibration of the i-th injection, in mM
    ConcCellMonomerBefore = (CellMonomerPre+SyringeMonomerInjected)/VolumeAdded
    ## concentration of the free monomer after equilibration of the i-th injection, in mM
    ConcCellMonomerAfter = ConcCellMonomer[0]
    ## change in concentration of the free monomer over the equilibration of the i-th injection, in mM
    ConcCellMonomerChange = ConcCellMonomerAfter - ConcCellMonomerBefore
    ##
    return Vcell*DHd*ConcCellMonomerChange
DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
print "optimized parameters:", DimerDissociation_opt
print "covariance matrix:", DimerDissociation_cov
print "standard deviation of fit parameters:", DimerDissociation_stdev

And the associated error: 和相关的错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-b5ef2361feed> in <module>()
     52     ##
     53     return Vcell*DHd*ConcCellMonomerChange
---> 54 DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
     55 DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
     56 print "optimized parameters:", DimerDissociation_opt

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, **kw)
    553     # Remove full_output from kw, otherwise we're passing it in twice.
    554     return_full = kw.pop('full_output', False)
--> 555     res = leastsq(func, p0, args=args, full_output=1, **kw)
    556     (popt, pcov, infodict, errmsg, ier) = res
    557 

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    367     if not isinstance(args, tuple):
    368         args = (args,)
--> 369     shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
    370     m = shape[0]
    371     if n > m:

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
     18 def _check_func(checker, argname, thefunc, x0, args, numinputs,
     19                 output_shape=None):
---> 20     res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
     21     if (output_shape is not None) and (shape(res) != output_shape):
     22         if (output_shape[0] != 1):

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _general_function(params, xdata, ydata, function)
    443 
    444 def _general_function(params, xdata, ydata, function):
--> 445     return function(xdata, *params) - ydata
    446 
    447 

<ipython-input-38-b5ef2361feed> in DimerDissociation(injection, Kd, DHd)
     19     CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
     20     ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
---> 21     ConcCellMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotal/2])
     22     ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
     23     ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]

//anaconda/lib/python2.7/site-packages/numpy/lib/polynomial.pyc in roots(p)
    199     """
    200     # If input is scalar, this makes it an array
--> 201     p = atleast_1d(p)
    202     if len(p.shape) != 1:
    203         raise ValueError("Input must be a rank-1 array.")

//anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in atleast_1d(*arys)
     47     res = []
     48     for ary in arys:
---> 49         ary = asanyarray(ary)
     50         if len(ary.shape) == 0 :
     51             result = ary.reshape(1)

//anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asanyarray(a, dtype, order)
    512 
    513     """
--> 514     return array(a, dtype, copy=False, order=order, subok=True)
    515 
    516 def ascontiguousarray(a, dtype=None):

ValueError: setting an array element with a sequence.

The problem is that numpy.curve_fit passes the xdata to your objective function as an array. 问题是numpy.curve_fit将xdata作为数组传递给目标函数。 This means that all the operations on injection in DimerDissociation are actually array operations. 这意味着DimerDissociation中所有关于injection的操作实际上都是数组操作。 Consequently, ConcCellTotal is also an array (check this by inserting print type(ConcCellTotal) at line 27 in your code). 因此, ConcCellTotal也是一个数组(通过在代码的第27行插入print type(ConcCellTotal)进行检查)。 That means your call to np.roots looks like np.roots([scalar, scalar, array]) , which is the source of the error. 这意味着您对np.roots的调用看起来像np.roots([scalar, scalar, array]) ,这是错误的根源。

I always get turned around when I work with these things, but I think the idea is that the objective function for the optimizer should be completely vectorized; 当我处理这些事情时,我总是会转身,但是我认为这种想法是,优化程序的目标函数应该完全矢量化。 each time it's called, it needs to return an array with one energy value for every injection value. 每次调用时,它需要为每个注入值返回一个具有一个能量值的数组。

I fixed the error above by explicitly making ConcCellMonomer_roots an array, and I also threw in some naive reporting on variable states: 我通过显式使ConcCellMonomer_roots成为数组来解决上述错误,并且我还对变量状态进行了一些幼稚的报告:

def DimerDissociation(injection, Kd, DHd): 
    print 'Called DimerDissociation'
    VolumeAdded = 6.0+(injection-1.0)*Vinj ## in uL
    VolumeTotal = Vinit + VolumeAdded ## in uL
    CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
    ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
    print 'total\t',np.shape(ConcCellTotal)
    ConcCellMonomer_roots =  np.asarray([np.roots([1.0, Kd/2.0, -Kd*i/2.0]) for i in ConcCellTotal])
    print 'roots\t',np.shape(ConcCellMonomer_roots)
    ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
    print 'real\t',np.shape(ConcCellMonomer_real)
    ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
    print 'positive\t',np.shape(ConcCellMonomer_positive)
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
    print 'monomer\t',np.shape(ConcCellMonomer)

I also made the corresponding correction to ConcCellMonomerPre_roots using np.asarray . 我还使用np.asarrayConcCellMonomerPre_roots进行了相应的更正。 With those edits, I get the optimizer to iterate a few times until ConcCellMonomer_roots contains some imaginary values. 通过这些编辑,我使优化器进行了几次迭代,直到ConcCellMonomer_roots包含一些虚拟值。 Once that happens, ConCellMonomer_real is no longer the same shape as ConcCellTotal , so the line ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal] throws a broadcast error. 一旦发生这种情况, ConCellMonomer_real不再与ConcCellTotal相同,因此ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]引发广播错误。 The calls to DimerDissociation give this output: DimerDissociation的调用给出以下输出:

Called DimerDissociation
total   (24,)
roots   (24, 2)
real    (48,)
positive(24,)
monomer (24,)

Until the last iteration: 直到最后一次迭代:

Called DimerDissociation
total   (24,)
roots   (24, 2)
real    (4,)
positive(4,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 66, in <module>
    DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
    res = leastsq(func, p0, args=args, full_output=1, **kw)
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 378, in leastsq
    gtol, maxfev, epsfcn, factor, diag)
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 444, in _general_function
    return function(xdata, *params) - ydata
  File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 35, in DimerDissociation
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
ValueError: operands could not be broadcast together with shapes (4) (24) 

Hopefully this puts you on the right track, although I'm not an expert here and someone else might have some much better ideas. 希望这能使您走上正确的道路,尽管我不是此处的专家,并且其他人可能有更好的主意。

I was not able to reproduce your error. 我无法重现您的错误。 The first problem I noticed is your use of np.roots . 我注意到的第一个问题是您对np.roots的使用。 roots(p) returns the roots of a polynomial specified by the coefficients in p , specifically p[0] + p[1] * x + p[2] * x**2 + ... . roots(p)返回由p的系数指定的多项式的根,特别是p[0] + p[1] * x + p[2] * x**2 + ... Your third coefficient, -Kd*ConcCellTotal/2 is a function of injections , which was an array. 第三个系数-Kd*ConcCellTotal/2injections的函数,它是一个数组。 There's no documented signature for np.roots that allows an array to be passed as one of the members of p . 没有np.roots的书面签名,该签名允许将数组作为p的成员之一传递。

Can you edit and clarify? 您可以编辑和澄清吗?

-Ravi -Ravi

PS A toy example demonstrating how curve_fit works: PS一个玩具示例,演示curve_fit工作原理:

import numpy as np
from scipy.optimize import curve_fit

x_in = np.array([-3.0,-2.0,-1.0,0.0,1.0,2.0,3.0])

def f(x,a,b):
    return a*x+b

y_in = f(x_in,3,2)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out

y_in = f(x_in,10,75)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out

The objective function f takes as arguments an array of x-values and one or more parameters. 目标函数 f将x值和一个或多个参数的数组作为参数。 curve_fit takes as arguments the objective function, an array of x-values x_in , and an array of y-values y_in as arguments. curve_fit将目标函数,x值x_in的数组和y值y_in的数组作为参数作为参数。 It then makes up some values for the parameters a and b and evaluates the objective function on x_in , which gives an array y_out . 然后,它为参数ab组成一些值,并在x_in上评估目标函数,该目标函数给出了数组y_out It computes the RMS error between y_in and y_out and then tweaks its values of a and b until the RMS error is minimized. 它计算y_iny_out之间的RMS误差,然后调整其ab的值,直到将RMS误差最小化为止。

The devil is really in the details of how initial values for a and b are selected (if they're not supplied, as the OP did) and how they're tweaked. 实际上,细节在于如何选择ab的初始值(如果没有提供它们,如OP那样)以及如何对其进行调整。 That's complicated, but not absolutely necessary for us scipy.optimize users to understand perfectly. 这很复杂,但对于我们scipy.optimize用户以使其完全理解并不是绝对必要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM