简体   繁体   English

传递类/列表/索引数组作为 scipy.optimize.curve_fit 的输入参数

[英]Pass array of classes/lists/indices as input argument for scipy.optimize.curve_fit

I am using curve_fit from scipy.optimize to fit some parameters of one equation.我正在使用curve_fit中的scipy.optimize来拟合一个方程的某些参数。 I find myself with several arrays of Xs and Ys training data samples and also arrays of conditions for each pair (X,Y) which are also parameters that are given to the equation (and are not equal in general).我发现自己有几个 arrays 的 Xs 和 Ys 训练数据样本,以及每对 (X,Y) 的 arrays 条件,它们也是赋予方程的参数(通常不相等)。 The equation is something like:等式是这样的:

Y[i] = Equation(X[i], *C[i], *K)

with:和:

  • X[i] a list of x-values (n lists in total) X[i] x 值列表(总共 n 个列表)
  • Y[i] a list of y-values (n lists in total) Y[i] y 值列表(总共 n 个列表)
  • C[i] given parameters (n lists in total) C[i]给定参数(共n个列表)
  • K the parameters to fit K 适合的参数

If I only had one array of each type a lambda function would be enough, but that's not the case.如果每种类型只有一个数组,那么 lambda function 就足够了,但事实并非如此。 The one idea I came up with is somehow using np.concatenate to join the arrays in just one of each kind (X, Y and C), but I find myself unable to pass it properly so that the function can work it out.我想出的一个想法是以某种方式使用np.concatenate将 arrays 加入每种类型(X、Y 和 C)中的一种,但我发现自己无法正确传递它,以便 function 可以解决它。

I tried several ways to perform this.我尝试了几种方法来执行此操作。 One approach I came up with is by creating a class with both the X data and the conditions.我想到的一种方法是创建一个包含 X 数据和条件的 class。 As an example, it was something like this:例如,它是这样的:

import numpy as np
import scipy.optimize as opt

class MyClass:
    def __init__(self, a1, a2, a3):
        self.A1 = a1
        self.A2 = a2
        self.A3 = a3

f = lambda x, b, c: b*x.A1 + c*x.A2 + x.A3

X = np.linspace(0,10,20)

MyClass_array = np.array([MyClass(element,1,2) for element in X])

Y = X + 2

opt.curve_fit(f, MyClass_array, Y)

Which gives me the following output:这给了我以下 output:

TypeError: float() argument must be a string or a number, not 'MyClass' TypeError: float() 参数必须是字符串或数字,而不是“MyClass”

I tried using lists in a similar way to this code:我尝试以与此代码类似的方式使用列表:

import numpy as np
import scipy.optimize as opt

f = lambda x, b, c: b*x[0] + c*x[1] + x[2]

X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
Y = 2 + np.linspace(0,10,20)

opt.curve_fit(f, X, Y)

Again, there is a mistake since apparently both arrays need to have the same shape, and returns:再次出现错误,因为显然 arrays 需要具有相同的形状,并返回:

ValueError: operands could not be broadcast together with shapes (3,) (20,) ValueError:操作数无法与形状 (3,) (20,) 一起广播

Lastly, I tried to create an array with two lists, both with the same shape so that one would be the X and the other the positions of the conditions which would be stored on another list in a similar way to this code:最后,我尝试创建一个包含两个列表的数组,两个列表都具有相同的形状,因此一个是 X,另一个是条件的位置,这些条件的位置将以与此代码类似的方式存储在另一个列表中:

import numpy as np
import scipy.optimize as opt

Aux = [[1,2],[1,2]]

f = lambda x, b, c: b*x[0] + c*Aux[np.int(x[1])][0] + Aux[np.int(x[1])][0]

x1 = np.linspace(0,10,20)
x2 = np.zeros(20).astype('int')
X = np.array([x1, x2])

Y = 2 + x1

opt.curve_fit(f, X, Y)

But then again, it raises:但话又说回来:

TypeError: only size-1 arrays can be converted to Python scalars TypeError: only size-1 arrays 可以转换为 Python 标量

since I can't use the array as an index.因为我不能使用数组作为索引。 Is there any way I can make the x2 array values go sequentially as an index as the x1 's are?有什么方法可以使x2数组值 go 像x1的索引一样顺序排列? (although I know x1 is not working as an index) (虽然我知道x1不能用作索引)

Is there anything I can do in any of these scenarios to make it work?在这些情况下,我可以做些什么来让它发挥作用吗?

This is more of a comment(s) than answer, but it will be too long, and probably will be edited beyond the 5 minute limit.这与其说是回答,不如说是评论,但它会太长,而且可能会被编辑超过 5 分钟的限制。

First - when talking about errors, show the full traceback;首先 - 在谈论错误时,显示完整的回溯; you/we need to know exactly where the error occurs.您/我们需要确切地知道错误发生的位置。 For example is the error in the curve_fit itself, or in fn , or in some conversion step before hand?例如,错误是在curve_fit本身中,还是在fn中,或者是在手头的某个转换步骤中?

Second - make sure you understand what curve_fit expects - from the function as well as the arrays. I won't review the docs (right now), but most likely it expects number arrays, 1 or 2d.其次 - 确保您了解curve_fit期望的内容 - 来自 function 和 arrays。我不会查看文档(现在),但很可能它期望数字 arrays、1 或 2d。 Object dtype arrays, or arrays of lists or custom class objects won't work. Object dtype arrays,或列表的 arrays 或自定义 class 对象将不起作用。

At a quick glance it looks like you are trying a bunch of different things without really understanding either of the above.乍一看,您似乎正在尝试一堆不同的事情,但并未真正理解上述任何一项。 Debugging by trying random things does not work.通过随机尝试进行调试是行不通的。

If my memory is correct the fn should be something that works like fn(X, b, c) , and the result should be comparable to Y .如果我的 memory 是正确的,则fn应该类似于fn(X, b, c) ,结果应该与Y相当。 curve_fit will pass your X to it, along with trial values of b,c , and compare the result with Y . curve_fit会将您的X连同b,c的试验值传递给它,并将结果与Y进行比较。 It's a good idea to do a trial calculation of your own, eg自己做一个试算是个好主意,例如

fn(X,1,1)

and check the shape and dtype, and make sure you can subtract Y from it.并检查形状和数据类型,并确保可以从中减去Y

Often it helps to include a print(X) or print(X.shape) in fn so you have a clear(er) idea of how curve_fit calling it.fn中包含print(X)print(X.shape)通常会有所帮助,这样您就可以清楚地了解curve_fit如何调用它。

I see from the curve_fit source code, that X must be float, or convertible to float:我从curve_fit源代码中看到, X必须是浮点数,或者可转换为浮点数:

xdata = np.asarray(xdata, float)    # or actually
np.asarray_chkfinite(xdata, float)  # same thing but a little more checking

Y must also be float . Y也必须是float

edit编辑

With your first block of code, I've added a method the the class definition.在您的第一个代码块中,我在 class 定义中添加了一个方法。

def __repr__(self):
    return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'

So the array displays more usefully:所以数组显示更有用:

In [5]: MyClass_array
Out[5]: 
array([MyClass <0.0, 1, 2>, MyClass <0.5263157894736842, 1, 2>,
       MyClass <1.0526315789473684, 1, 2>,
       MyClass <1.5789473684210527, 1, 2>,
       ...
       MyClass <9.473684210526315, 1, 2>, MyClass <10.0, 1, 2>],
      dtype=object)

Then when I try the curve_fit:然后当我尝试 curve_fit 时:

In [6]: opt.curve_fit(f, MyClass_array, Y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 opt.curve_fit(f, MyClass_array, Y)

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:790, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
    786 if isinstance(xdata, (list, tuple, np.ndarray)):
    787     # `xdata` is passed straight to the user-defined `f`, so allow
    788     # non-array_like `xdata`.
    789     if check_finite:
--> 790         xdata = np.asarray_chkfinite(xdata, float)
    791     else:
    792         xdata = np.asarray(xdata, float)

File ~\anaconda3\lib\site-packages\numpy\lib\function_base.py:486, in asarray_chkfinite(a, dtype, order)
    422 @set_module('numpy')
    423 def asarray_chkfinite(a, dtype=None, order=None):
    424     """Convert the input to an array, checking for NaNs or Infs.
    425 
    426     Parameters
   (...)
    484 
    485     """
--> 486     a = asarray(a, dtype=dtype, order=order)
    487     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    488         raise ValueError(
    489             "array must not contain infs or NaNs")
TypeError: float() argument must be a string or a number, not 'MyClass'

This is what I mean by the full error message.这就是我所说的完整错误消息的意思。 What I see is that it is trying to make a float dtype array (via asarray_chkfinite and asarray ).我看到的是它试图制作一个 float dtype 数组(通过asarray_chkfiniteasarray )。 astype produces the same error: astype产生相同的错误:

In [7]: MyClass_array.astype(float)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 MyClass_array.astype(float)

TypeError: float() argument must be a string or a number, not 'MyClass'

But what if curve_fit left your array as is, and passed it to the f function:但是如果curve_fit保留你的数组,并将它传递给f function 会怎样:

In [8]: f(MyClass_array,1,2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 f(MyClass_array,1,2)

Cell In[4], line 11, in <lambda>(x, b, c)
      8     def __repr__(self):
      9         return f'MyClass <{self.A1}, {self.A2}, {self.A3}>'
---> 11 f = lambda x, b, c: b*x.A1 + c*x.A2 + x.A3
     13 X = np.linspace(0,10,20)
     15 MyClass_array = np.array([MyClass(element,1,2) for element in X])

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

You wrote f to work with one MyClass object, not with a whole array of them:您编写f是为了与一个MyClass object 一起使用,而不是与它们的整个数组一起使用:

In [9]: f(MyClass_array[1],1,2)
Out[9]: 4.526315789473684

2nd try第二次尝试

In [10]: f = lambda x, b, c: b*x[0] + c*x[1] + x[2]
    ...: 
    ...: X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
    ...: Y = 2 + np.linspace(0,10,20)
    ...: 
    ...: opt.curve_fit(f, X, Y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 6
      3 X = np.array([[element, 1, 2] for element in np.linspace(0,10,20)])
      4 Y = 2 + np.linspace(0,10,20)
----> 6 opt.curve_fit(f, X, Y)

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:834, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
    831 if ydata.size != 1 and n > ydata.size:
    832     raise TypeError(f"The number of func parameters={n} must not"
    833                     f" exceed the number of data points={ydata.size}")
--> 834 res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
    835 popt, pcov, infodict, errmsg, ier = res
    836 ysize = len(infodict['fvec'])

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:410, in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    408 if not isinstance(args, tuple):
    409     args = (args,)
--> 410 shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
    411 m = shape[0]
    413 if n > m:

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:24, in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
     22 def _check_func(checker, argname, thefunc, x0, args, numinputs,
     23                 output_shape=None):
---> 24     res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
     25     if (output_shape is not None) and (shape(res) != output_shape):
     26         if (output_shape[0] != 1):

File ~\anaconda3\lib\site-packages\scipy\optimize\_minpack_py.py:485, in _wrap_func.<locals>.func_wrapped(params)
    484 def func_wrapped(params):
--> 485     return func(xdata, *params) - ydata

ValueError: operands could not be broadcast together with shapes (3,) (20,) 

Now X is a 2d array of floats;现在X是一个二维浮点数组; Y is 1d of floats. Y是浮点数的 1d。

In [11]: X.shape, X.dtype
Out[11]: ((20, 3), dtype('float64'))

In [12]: Y.shape, Y.dtype
Out[12]: ((20,), dtype('float64'))

With this f , the result (given the (20,3) X array, is a (3,) array:使用此f ,结果(给定 (20,3) X数组,是一个 (3,) 数组:

In [13]: f(X,1,2)
Out[13]: array([2.10526316, 4.        , 8.        ])

The error comes when it tries to compare the result of the func call with the ydata当它尝试将func调用的结果与ydata进行比较时出现错误

func(xdata, *params) - ydata

It can't subtract a (20,) from a (3,) array.它不能从 (3,) 数组中减去 (20,)。

The curve_fit docs clearly state that it expects the func to behave like: curve_fit文档明确 state 它期望func的行为如下:

ydata = f(xdata, *params) + eps

Your f = lambda x, b, c: b*x[0] + c*x[1] + x[2] , when given a (20,3) shape x , computes something from the first 3 rows of x .你的f = lambda x, b, c: b*x[0] + c*x[1] + x[2] ,当给定一个 (20,3) 形状x时,从x的前 3 行计算一些东西。

In other words it is is:换句话说,它是:

f = lambda x, b, c: b*x[0,:] + c*x[1,:] + x[2,:]

Did you want instead你想要吗

f = lambda x, b, c: b*x[:,0] + c*x[:,1] + x[:,2]

That f(X, 1,2) should give a (20,) result, which can be subtracted with Y . f(X, 1,2)应该给出 (20,) 结果,可以用Y减去。

3rd try第三次尝试

In [15]: X
Out[15]: 
array([[ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
         2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
         5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
         7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [16]: X.shape, X.dtype
Out[16]: ((2, 20), dtype('float64'))

I could show the full curve_fit traceback, but let's check directly how f works with the X .我可以显示完整的curve_fit回溯,但让我们直接检查f如何与X一起工作。 First I get a DeprecationWarning because you use np.int首先我得到一个 DeprecationWarning 因为你使用np.int

In [20]: f(X,1,2)
C:\Users\paul\AppData\Local\Temp\ipykernel_4556\3065079927.py:3: DeprecationWarning: `np.int` ...
  f = lambda x, b, c: b*x[0] + c*Aux[np.int(x[1])][0] + Aux[np.int(x[1])][0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 f(X,1,2)

Cell In[14], line 3, in <lambda>(x, b, c)
      1 Aux = [[1,2],[1,2]]
----> 3 f = lambda x, b, c: b*x[0] + c*Aux[np.int(x[1])][0] + Aux[np.int(x[1])][0]
      5 x1 = np.linspace(0,10,20)
      6 x2 = np.zeros(20).astype('int')

TypeError: only size-1 arrays can be converted to Python scalars

As with the first case, you are assuming the curve_fit passes just one "sample" to the f ;与第一种情况一样,您假设curve_fit仅将一个“样本”传递给f where as it really passes the whole array.它真正传递整个数组的地方。 X[1] is an array of 20 values. X[1]是一个包含 20 个值的数组。

In [21]: int(X[1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 int(X[1])

TypeError: only size-1 arrays can be converted to Python scalars

The common factor in all these errors is that you did not verify that your f works with the X .所有这些错误的共同因素是您没有验证您的f是否适用于X You seem to be under the impression that curve_fit passes on element or row of X at a time.您似乎认为curve_fit传递X的元素或行。

2nd try revisited第二次尝试重访

The docs are bit unclear whether X has to be (20,), or can it be (20,3).文档有点不清楚X是否必须是 (20,),或者它可以是 (20,3)。 Lets define f to work with a 3 column X :让我们定义f以使用 3 列X

In [24]: f = lambda x, b, c: b*x[:,0] + c*x[:,1] + x[:,2]

Then with a trial call f produces a (20,) array which can be tested against Y :然后通过试调用f生成一个 (20,) 数组,可以针对Y进行测试:

In [25]: f(X, 1, 2)
Out[25]: 
array([ 4.        ,  4.52631579,  5.05263158,  5.57894737,  6.10526316,...
       11.89473684, 12.42105263, 12.94736842, 13.47368421, 14.        ])

In [26]: f(X, 1, 2)-Y
Out[26]: 
array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

And if we do the curve_fits:如果我们执行 curve_fits:

In [27]: opt.curve_fit(f, X, Y)
Out[27]: 
(array([ 1.00000000e+00, -2.18110054e-12]),
 array([[ 2.11191137e-27, -2.44049320e-31],
        [-2.44049320e-31,  1.93496881e-33]]))

It runs and gives a result.它运行并给出结果。 Testing the 2 parameters it found:测试它发现的 2 个参数:

In [28]: f(X, 1, 0)
Out[28]: 
array([ 2.        ,  2.52631579,  3.05263158,  3.57894737,  4.10526316,...
        9.89473684, 10.42105263, 10.94736842, 11.47368421, 12.        ])

In [29]: f(X, 1, 0)-Y
Out[29]: 
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

This is kind of null case since your Y doesn't have any noise.这是一种 null 的情况,因为你的Y没有任何噪音。 It's just a function of np.linspace(0,10,20) , the same as X .它只是np.linspace(0,10,20) ,与X相同。 But it does show that curve_fit works with a (n,3) X , provide the f is correct.但它确实表明curve_fit与 (n,3) X一起使用,前提是f是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM