简体   繁体   English

从Python SciPy curve_fit获得更多优化结果

[英]Getting more refined results from Python SciPy curve_fit

I've got the following bit of Python (v2.7.14) code, which uses curve_fit from SciPy (v1.0.1) to find parameters for an exponential decay function. 我有以下Python(v2.7.14)代码,它使用SciPy(v1.0.1)中的curve_fit查找指数衰减函数的参数。 Most of the time, I get reasonable results. 大多数时候,我会得到合理的结果。 Occasionally though, I'll get some results which are completely out of my expected range, even though the found parameters will look fine when plotted against the original graph. 有时,尽管找到的参数相对于原始图形绘制时看起来会很好,但我有时还是会得到一些超出预期范围的结果。

First, my understanding of the exponential decay formula comes from https://en.wikipedia.org/wiki/Exponential_decay which I've translated to Python as: 首先,我对指数衰减公式的理解来自https://en.wikipedia.org/wiki/Exponential_decay ,我将其翻译为Python:


y = a * numpy.exp(-b * x) + c

Where by: 在哪里:

  • a is the initial value of the data a是数据的初始值
  • b is the decay rate, which is the inverse of when the signal gets to 1/e from initial value b是衰减率,它是信号从初始值变为1 / e时的倒数
  • c is an offset, as I am dealing with non-negative values in my data which never reach zero c是一个偏移量,因为我正在处理数据中永远不会达到零的非负值
  • x is the current time x是当前时间

The script takes into account that non-negative data is being fitted and offsets the initial guess appropriately. 该脚本考虑到要拟合非负数据,并适当地抵消了初始猜测。 But even without guessing, not offsetting, using max/min (instead of first/last values) and other random things I've tried, I cannot seem to get curve_fit to produce sensible values on the troublesome datasets. 但是,即使没有猜测,也没有使用最大/最小值(而不是第一个/最后一个值)和我尝试过的其他随机方法进行补偿,我似乎也无法获得curve_fit在麻烦的数据集上产生有意义的值。

My hypothesis is that the troublesome datasets don't have enough of a curve that can be fit without going way outside the realm of the data. 我的假设是,麻烦的数据集没有足够的曲线可以拟合而不会超出数据范围。 I've looked at the bounds argument for curve_fit, and thought that might be a reasonable option. 我查看了curve_fit的bounds参数,并认为这可能是一个合理的选择。 I'm unsure as to what would make good lower and upper bounds for the calculation, or if it is actually the option I am looking for. 我不确定什么将使计算的上下限更好,或者实际上是否是我正在寻找的选项。

Here is the code. 这是代码。 Commented out code are things I've tried. 注释掉的代码是我尝试过的事情。


#!/usr/local/bin/python

import numpy as numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as pyplot

def exponential_decay(x, a, b, c):
    return a * numpy.exp(-b * x) + c

def fit_exponential(decay_data, time_data, decay_time):
    # The start of the curve is offset by the last point, so subtract
    guess_a = decay_data[0] - decay_data[-1]
    #guess_a = max(decay_data) - min(decay_data)

    # The time that it takes for the signal to reach 1/e becomes guess_b
    guess_b = 1/decay_time

    # Since this is non-negative data, above 0, we use the last data point as the baseline (c)
    guess_c = decay_data[-1]
    #guess_c = min(decay_data)

    guess=[guess_a, guess_b, guess_c]
    print "guess: {0}".format(guess)

    #popt, pcov = curve_fit(exponential_decay, time_data, decay_data, maxfev=20000)
    popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, maxfev=20000)

    #bound_lower = [0.05, 0.05, 0.05]
    #bound_upper = [decay_data[0]*2, guess_b * 10, decay_data[-1]]
    #print "bound_lower: {0}".format(bound_lower)
    #print "bound_upper: {0}".format(bound_upper)
    #popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, bounds=[bound_lower, bound_upper], maxfev=20000)

    a, b, c = popt

    print "a: {0}".format(a)
    print "b: {0}".format(b)
    print "c: {0}".format(c)

    plot_fit = exponential_decay(time_data, a, b, c)

    pyplot.plot(time_data, decay_data, 'g', label='Data')
    pyplot.plot(time_data, plot_fit, 'r', label='Fit')
    pyplot.legend()
    pyplot.show()

print "Gives reasonable results"
time_data = numpy.array([0.0,0.040000000000000036,0.08100000000000018,0.12200000000000011,0.16200000000000014,0.20300000000000007,0.2430000000000001,0.28400000000000003,0.32400000000000007,0.365,0.405,0.44599999999999995,0.486,0.5269999999999999,0.567,0.6079999999999999,0.6490000000000002,0.6889999999999998,0.7300000000000002,0.7700000000000002,0.8110000000000002,0.8510000000000002,0.8920000000000001,0.9320000000000002,0.9730000000000001])
decay_data = numpy.array([1.342146870531986,1.405586070225509,1.3439802492549762,1.3567811728250267,1.2666276377825874,1.1686375326985337,1.216119360088685,1.2022841507836042,1.1926979408026064,1.1544395213303447,1.1904416926531907,1.1054720201415882,1.112100683833435,1.0811434035632939,1.1221671794680403,1.0673295063196415,1.0036146509494743,0.9984005680821595,1.0134498134883763,0.9996920772051201,0.929782730581616,0.9646581154122312,0.9290690593684447,0.8907360533169936,0.9121560047238627])
fit_exponential(decay_data, time_data, 0.567)

print

print "Gives results that are way outside my expectations"
time_data = numpy.array([0.0,0.040000000000000036,0.08099999999999996,0.121,0.16199999999999992,0.20199999999999996,0.24300000000000033,0.28300000000000036,0.32399999999999984,0.3650000000000002,0.40500000000000025,0.44599999999999973,0.48599999999999977,0.5270000000000001,0.5670000000000002,0.6079999999999997,0.6479999999999997,0.6890000000000001,0.7290000000000001,0.7700000000000005,0.8100000000000005,0.851,0.8920000000000003,0.9320000000000004,0.9729999999999999,1.013,1.0540000000000003])
decay_data = numpy.array([1.4401611921948776,1.3720688158534153,1.3793465463227048,1.2939909686762128,1.3376345321949346,1.3352710161631154,1.3413634841956348,1.248705138603995,1.2914294791901497,1.2581763134585313,1.246975264018646,1.2006447776495062,1.188232179689515,1.1032789127515186,1.163294324147017,1.1686263160765304,1.1434009568472243,1.0511578409946472,1.0814520440570896,1.1035953824496334,1.0626893599266163,1.0645580326776076,0.994855722989818,0.9959891485338087,0.9394584009825916,0.949504060086646,0.9278639431146273])
fit_exponential(decay_data, time_data, 0.6890000000000001)

And here is the text output: 这是文本输出:


Gives reasonable results
guess: [0.4299908658081232, 1.7636684303350971, 0.9121560047238627]
a: 1.10498934435
b: 0.583046565885
c: 0.274503681044

Gives results that are way outside my expectations
guess: [0.5122972490802503, 1.4513788098693758, 0.9278639431146273]
a: 742.824622191
b: 0.000606308344957
c: -741.41398516

Most notably, with the second set of results, the value for a is very high, with the value for c being equally low on the negative scale, and b being a very small decimal number. 最值得注意的是,与第二组的结果,对于一个值是非常高的,与c是同样低在负的规模,和b为一个非常小的十进制数的值。

Here is the graph of the first dataset, which gives reasonable results. 这是第一个数据集的图形,给出了合理的结果。 这是第一个数据集的图形,给出了合理的结果。

Here is the graph of the second dataset, which does not give good results. 这是第二个数据集的图形,效果不佳。 这是第二个数据集的图形,效果不佳。

Note that the graph itself plots correctly, though the line does not really have a good curve to it. 请注意,尽管线条实际上没有很好的曲线,但图形本身可以正确绘制。

My questions: 我的问题:

  • Is my implementation of the exponential decay algorithm with curve_fit correct? 我用curve_fit实现的指数衰减算法正确吗?
  • Are my initial guess parameters good enough? 我最初的猜测参数是否足够好?
  • Is the bounds parameter the correct solution for this problem? bounds参数是否是此问题的正确解决方案? If so, what is a good way to determine lower and upper bounds? 如果是这样,什么是确定下限和上限的好方法?
  • Have I missed something here? 我在这里错过了什么吗?

Again, thank you! 再次谢谢你!

When you say that the second fit gives results that are "way outside" of your expectations and that although the second graph "plots correctly" the line does not really "have a good curve fit" you are on the right track to understanding what is going on. 当您说第二个拟合所提供的结果超出您的期望“并且超出了您的期望”,并且尽管第二个图形“正确绘制”时,该线并没有真正“具有良好的曲线拟合”,您在正确的道路上理解了什么是继续。 I think you are just missing a piece of the puzzle. 我认为您只是迷失了一部分。

The second graph is fit pretty well by a curve that does look linear. 第二张图非常适合看起来确实是线性的曲线。 That probably means that you don't really have enough change in your data (well, perhaps below the noise level) to detect that it is an exponential decay. 这可能意味着您实际上没有足够的数据变化(很可能低于噪声水平)来检测它是指数衰减。

I would bet that if you printed out not only the best-fit values but also the uncertainties and correlations for the variables that you would see that the uncertainties are huge and some of the correlations are very close to 1. That may mean that taking into account the uncertainties (and measurements always have uncertainties) the results might actually fit with your expectation. 我敢打赌,如果您不仅打印出最佳拟合值,而且还打印出变量的不确定性和相关性,您会发现不确定性很大,并且某些相关性非常接近1。这可能意味着考虑到考虑到不确定性(并且测量始终具有不确定性),结果实际上可能符合您的期望。 And that may also tell you that the data you have does not support an exponential decay very well. 这也可能告诉您,您拥有的数据不能很好地支持指数衰减。

You might also try other models for this data ("linear" comes to mind ;)) and compare goodness-of-fit statistics such as chi-square and Akaike information criterion. 您也可以尝试使用该数据的其他模型(想到“线性”;)并比较拟合优度统计数据,例如卡方和Akaike信息准则。

scipy.curve_fit does return the covariance matrix -- the pcov that you did not use in your example. scipy.curve_fit确实返回协方差矩阵-您在示例中未使用的pcov Unfortunately, scipy.curve_fit does not convert these values into uncertainties and correlation values, and it does not attempt to return any goodness-of-fit statistics at all. 不幸的是, scipy.curve_fit不会将这些值转换为不确定性和相关性值,并且根本不会尝试返回任何拟合优度统计信息。

To fully explain any fit to data, you need not only the best-fit values but also an estimate of the uncertainties for the variable parameters. 为了充分说明对数据的拟合,您不仅需要最佳拟合值,还需要估计可变参数的不确定性。 And you need the goodness-of-fit statistics in order to determine if a fit is good, or at least whether one fit is better than another. 而且,您需要拟合优度统计信息才能确定拟合是否良好,或者至少确定一个拟合优于另一个拟合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM