简体   繁体   中英

How to speed up python curve_fit over a 2D array?

I have to use the curve_fit numpy function over a large set of data (5 000 000). So basically I've created a 2D array. First dimension is the number of fittings to perform, second dimension is the number of points used for the fitting.

t = np.array([0 1 2 3 4])

for d in np.ndindex(data.shape[0]):
  try:
    popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100])
  except RuntimeError:
    print("Error - curve_fit failed")

multiprocessing can be used to speed up the full process, but it is still quite slow. Is there a way to use curve_fit in a "vectorized" manner?

One way to speed it up is by adding in some prior knowledge to curve_fit.

If you know the range you expect your parameters to be, and if you don't need precision up to the 100th significant number, you can speed up your computations massively.

Here an example, in which you'd be fitting for param1 and param2 :

t = np.array([0 1 2 3 4])
def func(t, param1, param2):
  return param1*t + param2*np.exp(t)

for d in np.ndindex(data.shape[0]):
  try:
    popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100], 
                           bounds=([min_param1, min_param2],[max_param1, max_param2]),
                           ftol=0.5, xtol=0.5)
  except RuntimeError:
    print("Error - curve_fit failed")

Note the extra key arguments bounds , ftol and xtol . You can read about them here.

Curve fit extends the functionality of scipy.optimize.leastsq which is itself a wrapper for the underlying MINPACK lmdif and lmder fortran routines. It looks like multi-threading is not possible, check out this link , which says,

The underlying Fortran 77 routines (MINPACK lmder.f and lmdif.f) are not reentrant, so the GIL cannot be released. (Thus no chance of parallel processing with threads.)

There is still an open ticket to develop this but it looks like it can not be finished... You would either need to use a different library or write a wrapper/function in a lower level code. There are papers on implementations of parallel Levenberg-Marquardt algorithms.

Maybe there is another solution, use less data or as a rough estimate, you could randomly split your data into parts, curve fit each of the parts on a separate thread (with multi-processor) and take an average of the coefficients at the end.

With my experience, you should supply jacobian to curve_fit, if possible. It will save time by avoiding calling func again and again to calculate the jacobian. It would give you a significant speed boost, especially if you are dealing with a large number of optimizable parameters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM