简体   繁体   中英

Curve Fitting For 3 dimensional data in python

I have X and Y data with (7,360,720) dimension (global grid cells with 0.5 resolution) as input data and I want to fit Sigmoid curve with below code and obtaining curve parameters in the same shape as X and Y:

# -*- coding: utf-8 -*-
import  os, sys
from    collections import OrderedDict  as odict
import  numpy   as np
import  pylab   as pl
import numpy.ma as ma
from scipy.optimize import curve_fit


f=open('test.csv','w')
def sigmoid(x,a,b, c):
        y = a+(b*(1 - np.exp(-c*(x**2))))
        return y
for i in range(360):
      for j in range(720):
        xdata=[0,x[0,i,j],x[1,i,j],x[2,i,j],x[3,i,j],x[4,i,j],x[5,i,j],x[6,i,j]] 
        ydata=[0,y[0,i,j],y[1,i,j],y[2,i,j],y[3,i,j],y[4,i,j],y[5,i,j],y[6,i,j]]
        popt, pcov = curve_fit(sigmoid, xdata, ydata)
        print popt
        f.write(','.join(map(str,popt)))
        f.write("\n")
f.close()

Now this code write and sore fitting result in .csv file with 3 columns(a,b,c), but I want o write and store fitting result in the file with (360,720) shape as grid cells. also this code show me below error: RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.

The dimensionality of your data (referenced in the title to the question) is not the cause of the problem you are seeing. What you are trying to do is run 360*720 (~260,000) separate fits to your sigmoidal function, with inputs derived from your arrays x and y . It should work, but it might be slow simply because you are doing so many fits.

If you haven't already, you should definitely start by fitting a couple arrays to your function -- if you can't get 3 to work, there's no point in trying 260,000, right? So, start with 1, then try 3, then 360, then all of them.

I suspect the problem you are seeing is because curve_fit() stupidly allows you to not explicitly specify starting values for your parameters, and even-more-stupidly assigns unspecified starting values to the arbitrary value of 1. This encourages new users to not think more carefully about the problem they are trying to solve, and then gives cryptic error messages like the one you seeing that do not explicitly say "you need better starting values". The message says the fit took many iterations, which probably means the fit "got lost" trying to find optimal values. That "getting lost" is probably because it started "too far from home".

In general, curve fitting is sensitive to the starting values of the parameters. And you probably do know better starting values than a=1, b=1, c=1 . I suspect that you also know that exponentiation can get huge or tiny very quickly. Because of this, and depending on the scale of your x , there are probably ranges of values for c that are not really sensible -- it might be that c should be positive, and smaller than 10, for example. Again, you probably sort of know these ranges.

Let me suggest using lmfit ( https://lmfit.github.io/lmfit-py/ ) for this work. It provides an alternative approach to curve-fitting, with many useful improvements over curve_fit . For your problem, a single fit might look like this:

import numpy as np
from lmfit import Model

def sigmoid(x, offset, scale, decay):
    return offset + scale*(1 - np.exp(-decay*(x**2)))

## set up the model and parameters from your model function
# note that parameters will be *named* using the names of the
# arguments to your model function.
model = Model(sigmoid)
# make parameters (OrderedDict-like) with initial values
params = model.make_params(offset=0, scale=1, decay=0.25) 

# you may want to set bounds on some of the parameters
params['scale'].min = 0
params['decay'].min = 0
params['decay'].max = 5

# you can also fix some parameters if desired
# params['offset'].vary = False 

## set up data
# pick arbitrary data to fit, and make sure data use np arrays.
# but also: (0, 0) isn't in your data -- do you need to assert it?
# won't that drive `offset` to 0?
i, j = 7, 12
xdata = np.array([0] + x[:, i, j])
ydata = np.array([0] + y[:, i, j])

# now fit model to data, get results
result = model.fit(params, ydata, x=xdata)

print(result.fit_report())

This will print out a report with fit statistics, best-fit parameter values, and uncertainties. You can read the docs for all the components of results , but results.params holds best-fit parameters and uncertainties.

For use in a loop, this approach has the convenient feature that result is unique for each data set, while the starting params are not altered by the fit and can be re-used as starting values for all your fits. A test loop might look like

results = []
for i in (50, 150, 250):
    for j in (200, 400, 600):
        xdata = np.array([0] + x[:, i, j])
        ydata = np.array([0] + y[:, i, j])

        result = model.fit(params, ydata, x=xdata)
        results.append([i, j, result.params, result.chisqr])

It will still be possible that some of the 260,000 fits will not succeed, but I think that lmfit will give you better tools to avoid and identify these cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM