简体   繁体   中英

Getting more refined results from Python SciPy curve_fit

I've got the following bit of Python (v2.7.14) code, which uses curve_fit from SciPy (v1.0.1) to find parameters for an exponential decay function. Most of the time, I get reasonable results. Occasionally though, I'll get some results which are completely out of my expected range, even though the found parameters will look fine when plotted against the original graph.

First, my understanding of the exponential decay formula comes from https://en.wikipedia.org/wiki/Exponential_decay which I've translated to Python as:


y = a * numpy.exp(-b * x) + c

Where by:

  • a is the initial value of the data
  • b is the decay rate, which is the inverse of when the signal gets to 1/e from initial value
  • c is an offset, as I am dealing with non-negative values in my data which never reach zero
  • x is the current time

The script takes into account that non-negative data is being fitted and offsets the initial guess appropriately. But even without guessing, not offsetting, using max/min (instead of first/last values) and other random things I've tried, I cannot seem to get curve_fit to produce sensible values on the troublesome datasets.

My hypothesis is that the troublesome datasets don't have enough of a curve that can be fit without going way outside the realm of the data. I've looked at the bounds argument for curve_fit, and thought that might be a reasonable option. I'm unsure as to what would make good lower and upper bounds for the calculation, or if it is actually the option I am looking for.

Here is the code. Commented out code are things I've tried.


#!/usr/local/bin/python

import numpy as numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as pyplot

def exponential_decay(x, a, b, c):
    return a * numpy.exp(-b * x) + c

def fit_exponential(decay_data, time_data, decay_time):
    # The start of the curve is offset by the last point, so subtract
    guess_a = decay_data[0] - decay_data[-1]
    #guess_a = max(decay_data) - min(decay_data)

    # The time that it takes for the signal to reach 1/e becomes guess_b
    guess_b = 1/decay_time

    # Since this is non-negative data, above 0, we use the last data point as the baseline (c)
    guess_c = decay_data[-1]
    #guess_c = min(decay_data)

    guess=[guess_a, guess_b, guess_c]
    print "guess: {0}".format(guess)

    #popt, pcov = curve_fit(exponential_decay, time_data, decay_data, maxfev=20000)
    popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, maxfev=20000)

    #bound_lower = [0.05, 0.05, 0.05]
    #bound_upper = [decay_data[0]*2, guess_b * 10, decay_data[-1]]
    #print "bound_lower: {0}".format(bound_lower)
    #print "bound_upper: {0}".format(bound_upper)
    #popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, bounds=[bound_lower, bound_upper], maxfev=20000)

    a, b, c = popt

    print "a: {0}".format(a)
    print "b: {0}".format(b)
    print "c: {0}".format(c)

    plot_fit = exponential_decay(time_data, a, b, c)

    pyplot.plot(time_data, decay_data, 'g', label='Data')
    pyplot.plot(time_data, plot_fit, 'r', label='Fit')
    pyplot.legend()
    pyplot.show()

print "Gives reasonable results"
time_data = numpy.array([0.0,0.040000000000000036,0.08100000000000018,0.12200000000000011,0.16200000000000014,0.20300000000000007,0.2430000000000001,0.28400000000000003,0.32400000000000007,0.365,0.405,0.44599999999999995,0.486,0.5269999999999999,0.567,0.6079999999999999,0.6490000000000002,0.6889999999999998,0.7300000000000002,0.7700000000000002,0.8110000000000002,0.8510000000000002,0.8920000000000001,0.9320000000000002,0.9730000000000001])
decay_data = numpy.array([1.342146870531986,1.405586070225509,1.3439802492549762,1.3567811728250267,1.2666276377825874,1.1686375326985337,1.216119360088685,1.2022841507836042,1.1926979408026064,1.1544395213303447,1.1904416926531907,1.1054720201415882,1.112100683833435,1.0811434035632939,1.1221671794680403,1.0673295063196415,1.0036146509494743,0.9984005680821595,1.0134498134883763,0.9996920772051201,0.929782730581616,0.9646581154122312,0.9290690593684447,0.8907360533169936,0.9121560047238627])
fit_exponential(decay_data, time_data, 0.567)

print

print "Gives results that are way outside my expectations"
time_data = numpy.array([0.0,0.040000000000000036,0.08099999999999996,0.121,0.16199999999999992,0.20199999999999996,0.24300000000000033,0.28300000000000036,0.32399999999999984,0.3650000000000002,0.40500000000000025,0.44599999999999973,0.48599999999999977,0.5270000000000001,0.5670000000000002,0.6079999999999997,0.6479999999999997,0.6890000000000001,0.7290000000000001,0.7700000000000005,0.8100000000000005,0.851,0.8920000000000003,0.9320000000000004,0.9729999999999999,1.013,1.0540000000000003])
decay_data = numpy.array([1.4401611921948776,1.3720688158534153,1.3793465463227048,1.2939909686762128,1.3376345321949346,1.3352710161631154,1.3413634841956348,1.248705138603995,1.2914294791901497,1.2581763134585313,1.246975264018646,1.2006447776495062,1.188232179689515,1.1032789127515186,1.163294324147017,1.1686263160765304,1.1434009568472243,1.0511578409946472,1.0814520440570896,1.1035953824496334,1.0626893599266163,1.0645580326776076,0.994855722989818,0.9959891485338087,0.9394584009825916,0.949504060086646,0.9278639431146273])
fit_exponential(decay_data, time_data, 0.6890000000000001)

And here is the text output:


Gives reasonable results
guess: [0.4299908658081232, 1.7636684303350971, 0.9121560047238627]
a: 1.10498934435
b: 0.583046565885
c: 0.274503681044

Gives results that are way outside my expectations
guess: [0.5122972490802503, 1.4513788098693758, 0.9278639431146273]
a: 742.824622191
b: 0.000606308344957
c: -741.41398516

Most notably, with the second set of results, the value for a is very high, with the value for c being equally low on the negative scale, and b being a very small decimal number.

Here is the graph of the first dataset, which gives reasonable results. 这是第一个数据集的图形,给出了合理的结果。

Here is the graph of the second dataset, which does not give good results. 这是第二个数据集的图形,效果不佳。

Note that the graph itself plots correctly, though the line does not really have a good curve to it.

My questions:

  • Is my implementation of the exponential decay algorithm with curve_fit correct?
  • Are my initial guess parameters good enough?
  • Is the bounds parameter the correct solution for this problem? If so, what is a good way to determine lower and upper bounds?
  • Have I missed something here?

Again, thank you!

When you say that the second fit gives results that are "way outside" of your expectations and that although the second graph "plots correctly" the line does not really "have a good curve fit" you are on the right track to understanding what is going on. I think you are just missing a piece of the puzzle.

The second graph is fit pretty well by a curve that does look linear. That probably means that you don't really have enough change in your data (well, perhaps below the noise level) to detect that it is an exponential decay.

I would bet that if you printed out not only the best-fit values but also the uncertainties and correlations for the variables that you would see that the uncertainties are huge and some of the correlations are very close to 1. That may mean that taking into account the uncertainties (and measurements always have uncertainties) the results might actually fit with your expectation. And that may also tell you that the data you have does not support an exponential decay very well.

You might also try other models for this data ("linear" comes to mind ;)) and compare goodness-of-fit statistics such as chi-square and Akaike information criterion.

scipy.curve_fit does return the covariance matrix -- the pcov that you did not use in your example. Unfortunately, scipy.curve_fit does not convert these values into uncertainties and correlation values, and it does not attempt to return any goodness-of-fit statistics at all.

To fully explain any fit to data, you need not only the best-fit values but also an estimate of the uncertainties for the variable parameters. And you need the goodness-of-fit statistics in order to determine if a fit is good, or at least whether one fit is better than another.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM