[英]scipy.optimize.curve_fit for a function with complex dependence on variable parameters
I'm relatively new to using Python to fit my data, so please excuse my lack of programming finesse. 我相对较不熟悉使用Python来拟合数据,因此请原谅我缺乏编程技巧的人。 I have, however, been unable to find a solution to the errors my current curve fitting attempts are throwing.
但是,我一直无法找到解决方案,以解决当前曲线拟合尝试所引发的错误。 I believe these errors are due to my model function's complex dependence on one of the two variable parameters (namely, Kd).
我相信这些错误是由于我的模型函数对两个变量参数之一(即Kd)的复杂依赖性所致。 I would appreciate insight regarding what specifically is causing this problem and how I may adjust my definition or fitting package to avoid it.
我将对导致此问题的原因以及如何调整定义或合适的包装以免出现此问题的见解表示赞赏。 Minimal working example to follow:
遵循的最小工作示例:
# Import libraries
import scipy as scipy
from scipy import stats
import numpy as np
from scipy.optimize import curve_fit
np.set_printoptions(precision=4)
ConcSyringeTotal = 9.5 ## total monomer concentration in the syringe [M]tot, in mM
Vinj = 10 ## volume injected in each injection, in uL
Vinit = 1250 ## volume of solvent initially in the sample cell, in uL
Vcell = 1000 ## cell volume, in uL (only the heat change within this volume is measured)
Injections = np.arange(2.00,26.00,1.00)
print Injections
Energy = np.array([136.953, 105.119, 84.414, 69.373, 60.898, 52.813, 46.187, 39.653, 33.894, 29.975, 27.315, 24.200, 21.643, 19.080, 16.158, 13.454, 13.218, 11.568, 10.742, 9.547, 8.693, 7.334, 6.111, 4.741])
print Energy
def DimerDissociation(injection, Kd, DHd): ## a dimer dissociation model for an ITC dilution experiment
## returns the heat flow (y-data, in ucal) per injection (x-data, unitless)
## fit for the dissociation constant (Kd, in mM = mmol/L, umol/mL, nmol/uL)
## and the enthalpy of dissociation (DHd, in ucal/nmol = kcal/mol)
##
## concentration (in mM) of the free monomer in the cell after equilibration of the i-th injection
VolumeAdded = 6+(injection-1)*Vinj ## in uL
VolumeTotal = Vinit + VolumeAdded ## in uL
CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
ConcCellMonomer_roots = np.roots([1, Kd/2, -Kd*ConcCellTotal/2])
ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
##
## concentration (in mM) of the free monomer in the syringe
ConcSyringeMonomer_roots = np.roots([1, Kd/2, -Kd*ConcSyringeTotal/2])
ConcSyringeMonomer_real = ConcSyringeMonomer_roots.real[abs(ConcSyringeMonomer_roots.imag)<1e-5]
ConcSyringeMonomer_positive = ConcSyringeMonomer_real[ConcSyringeMonomer_real>0]
ConcSyringeMonomer = ConcSyringeMonomer_positive[ConcSyringeMonomer_positive<ConcSyringeTotal]
## nmol of the free monomer injected from the syringe
SyringeMonomerInjected = Vinj*ConcSyringeMonomer[0]
##
## concentration (in mM) of the free monomer in the cell before the i-th injection
VolumeAddedPre = 6+(injection-2)*Vinj
VolumeTotalPre = Vinit + VolumeAddedPre
CellTotalPre = ConcSyringeTotal*VolumeAddedPre
ConcCellTotalPre = CellTotalPre/VolumeTotalPre
ConcCellMonomerPre_roots = np.roots([1, Kd/2, -Kd*ConcCellTotalPre/2])
ConcCellMonomerPre_real = ConcCellMonomerPre_roots.real[abs(ConcCellMonomerPre_roots.imag)<1e-5]
ConcCellMonomerPre_positive = ConcCellMonomerPre_real[ConcCellMonomerPre_real>0]
ConcCellMonomerPre = ConcCellMonomerPre_positive[ConcCellMonomerPre_positive<ConcCellTotalPre]
## nmol of the free monomer in the cell before the i-th injection
CellMonomerPre = VolumeTotalPre*ConcCellMonomerPre[0]
##
## concentration of the free monomer before equilibration of the i-th injection, in mM
ConcCellMonomerBefore = (CellMonomerPre+SyringeMonomerInjected)/VolumeAdded
## concentration of the free monomer after equilibration of the i-th injection, in mM
ConcCellMonomerAfter = ConcCellMonomer[0]
## change in concentration of the free monomer over the equilibration of the i-th injection, in mM
ConcCellMonomerChange = ConcCellMonomerAfter - ConcCellMonomerBefore
##
return Vcell*DHd*ConcCellMonomerChange
DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
print "optimized parameters:", DimerDissociation_opt
print "covariance matrix:", DimerDissociation_cov
print "standard deviation of fit parameters:", DimerDissociation_stdev
And the associated error: 和相关的错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-38-b5ef2361feed> in <module>()
52 ##
53 return Vcell*DHd*ConcCellMonomerChange
---> 54 DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
55 DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
56 print "optimized parameters:", DimerDissociation_opt
//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, **kw)
553 # Remove full_output from kw, otherwise we're passing it in twice.
554 return_full = kw.pop('full_output', False)
--> 555 res = leastsq(func, p0, args=args, full_output=1, **kw)
556 (popt, pcov, infodict, errmsg, ier) = res
557
//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
367 if not isinstance(args, tuple):
368 args = (args,)
--> 369 shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
370 m = shape[0]
371 if n > m:
//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
18 def _check_func(checker, argname, thefunc, x0, args, numinputs,
19 output_shape=None):
---> 20 res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
21 if (output_shape is not None) and (shape(res) != output_shape):
22 if (output_shape[0] != 1):
//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _general_function(params, xdata, ydata, function)
443
444 def _general_function(params, xdata, ydata, function):
--> 445 return function(xdata, *params) - ydata
446
447
<ipython-input-38-b5ef2361feed> in DimerDissociation(injection, Kd, DHd)
19 CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
20 ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
---> 21 ConcCellMonomer_roots = np.roots([1, Kd/2, -Kd*ConcCellTotal/2])
22 ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
23 ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
//anaconda/lib/python2.7/site-packages/numpy/lib/polynomial.pyc in roots(p)
199 """
200 # If input is scalar, this makes it an array
--> 201 p = atleast_1d(p)
202 if len(p.shape) != 1:
203 raise ValueError("Input must be a rank-1 array.")
//anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in atleast_1d(*arys)
47 res = []
48 for ary in arys:
---> 49 ary = asanyarray(ary)
50 if len(ary.shape) == 0 :
51 result = ary.reshape(1)
//anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asanyarray(a, dtype, order)
512
513 """
--> 514 return array(a, dtype, copy=False, order=order, subok=True)
515
516 def ascontiguousarray(a, dtype=None):
ValueError: setting an array element with a sequence.
The problem is that numpy.curve_fit
passes the xdata to your objective function as an array. 问题是
numpy.curve_fit
将xdata作为数组传递给目标函数。 This means that all the operations on injection
in DimerDissociation
are actually array operations. 这意味着
DimerDissociation
中所有关于injection
的操作实际上都是数组操作。 Consequently, ConcCellTotal
is also an array (check this by inserting print type(ConcCellTotal)
at line 27 in your code). 因此,
ConcCellTotal
也是一个数组(通过在代码的第27行插入print type(ConcCellTotal)
进行检查)。 That means your call to np.roots
looks like np.roots([scalar, scalar, array])
, which is the source of the error. 这意味着您对
np.roots
的调用看起来像np.roots([scalar, scalar, array])
,这是错误的根源。
I always get turned around when I work with these things, but I think the idea is that the objective function for the optimizer should be completely vectorized; 当我处理这些事情时,我总是会转身,但是我认为这种想法是,优化程序的目标函数应该完全矢量化。 each time it's called, it needs to return an array with one energy value for every injection value.
每次调用时,它需要为每个注入值返回一个具有一个能量值的数组。
I fixed the error above by explicitly making ConcCellMonomer_roots
an array, and I also threw in some naive reporting on variable states: 我通过显式使
ConcCellMonomer_roots
成为数组来解决上述错误,并且我还对变量状态进行了一些幼稚的报告:
def DimerDissociation(injection, Kd, DHd):
print 'Called DimerDissociation'
VolumeAdded = 6.0+(injection-1.0)*Vinj ## in uL
VolumeTotal = Vinit + VolumeAdded ## in uL
CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
print 'total\t',np.shape(ConcCellTotal)
ConcCellMonomer_roots = np.asarray([np.roots([1.0, Kd/2.0, -Kd*i/2.0]) for i in ConcCellTotal])
print 'roots\t',np.shape(ConcCellMonomer_roots)
ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
print 'real\t',np.shape(ConcCellMonomer_real)
ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
print 'positive\t',np.shape(ConcCellMonomer_positive)
ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
print 'monomer\t',np.shape(ConcCellMonomer)
I also made the corresponding correction to ConcCellMonomerPre_roots
using np.asarray
. 我还使用
np.asarray
对ConcCellMonomerPre_roots
进行了相应的更正。 With those edits, I get the optimizer to iterate a few times until ConcCellMonomer_roots
contains some imaginary values. 通过这些编辑,我使优化器进行了几次迭代,直到
ConcCellMonomer_roots
包含一些虚拟值。 Once that happens, ConCellMonomer_real
is no longer the same shape as ConcCellTotal
, so the line ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
throws a broadcast error. 一旦发生这种情况,
ConCellMonomer_real
不再与ConcCellTotal
相同,因此ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
引发广播错误。 The calls to DimerDissociation
give this output: 对
DimerDissociation
的调用给出以下输出:
Called DimerDissociation
total (24,)
roots (24, 2)
real (48,)
positive(24,)
monomer (24,)
Until the last iteration: 直到最后一次迭代:
Called DimerDissociation
total (24,)
roots (24, 2)
real (4,)
positive(4,)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 66, in <module>
DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 378, in leastsq
gtol, maxfev, epsfcn, factor, diag)
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 444, in _general_function
return function(xdata, *params) - ydata
File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 35, in DimerDissociation
ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
ValueError: operands could not be broadcast together with shapes (4) (24)
Hopefully this puts you on the right track, although I'm not an expert here and someone else might have some much better ideas. 希望这能使您走上正确的道路,尽管我不是此处的专家,并且其他人可能有更好的主意。
I was not able to reproduce your error. 我无法重现您的错误。 The first problem I noticed is your use of
np.roots
. 我注意到的第一个问题是您对
np.roots
的使用。 roots(p)
returns the roots of a polynomial specified by the coefficients in p
, specifically p[0] + p[1] * x + p[2] * x**2 + ...
. roots(p)
返回由p
的系数指定的多项式的根,特别是p[0] + p[1] * x + p[2] * x**2 + ...
Your third coefficient, -Kd*ConcCellTotal/2
is a function of injections
, which was an array. 第三个系数
-Kd*ConcCellTotal/2
是injections
的函数,它是一个数组。 There's no documented signature for np.roots
that allows an array to be passed as one of the members of p
. 没有
np.roots
的书面签名,该签名允许将数组作为p
的成员之一传递。
Can you edit and clarify? 您可以编辑和澄清吗?
-Ravi -Ravi
PS A toy example demonstrating how curve_fit
works: PS一个玩具示例,演示
curve_fit
工作原理:
import numpy as np
from scipy.optimize import curve_fit
x_in = np.array([-3.0,-2.0,-1.0,0.0,1.0,2.0,3.0])
def f(x,a,b):
return a*x+b
y_in = f(x_in,3,2)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out
y_in = f(x_in,10,75)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out
The objective function f
takes as arguments an array of x-values and one or more parameters. 目标函数
f
将x值和一个或多个参数的数组作为参数。 curve_fit
takes as arguments the objective function, an array of x-values x_in , and an array of y-values y_in as arguments. curve_fit
将目标函数,x值x_in的数组和y值y_in的数组作为参数作为参数。 It then makes up some values for the parameters a and b and evaluates the objective function on x_in , which gives an array y_out . 然后,它为参数a和b组成一些值,并在x_in上评估目标函数,该目标函数给出了数组y_out 。 It computes the RMS error between y_in and y_out and then tweaks its values of a and b until the RMS error is minimized.
它计算y_in和y_out之间的RMS误差,然后调整其a和b的值,直到将RMS误差最小化为止。
The devil is really in the details of how initial values for a and b are selected (if they're not supplied, as the OP did) and how they're tweaked. 实际上,细节在于如何选择a和b的初始值(如果没有提供它们,如OP那样)以及如何对其进行调整。 That's complicated, but not absolutely necessary for us
scipy.optimize
users to understand perfectly. 这很复杂,但对于我们
scipy.optimize
用户以使其完全理解并不是绝对必要的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.