简体   繁体   中英

Python Curve_Fit Exponential / Power / Log Curve - Improve Results

I am trying to fit this data which is asymptotically approaching zero (but never reaching it).

I believe the best curve is an Inverse Logistic Function, but open to suggestions. The Key is the decaying "S-curve" shape which is expected.

Here is the code I have so far, and the plot image below, which is a pretty ugly fit.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# DATA

x = pd.Series([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469]).values
y = pd.Series([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06]).values

# Inverse Logistic Function 
# https://en.wikipedia.org/wiki/Logistic_function
def func(x, L ,x0, k, b):
    y = 1/(L / (1 + np.exp(-k*(x-x0)))+b)
    return y

# FIT DATA

p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess
popt, pcov = curve_fit(func, x, y,p0, method='dogbox',maxfev=10000)

# PERFORMANCE

modelPredictions = func(x, *popt)
absError = modelPredictions - y
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(y))

print('Parameters:', popt)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

#PLOT

plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
#plt.xscale('log')
plt.show()

Here is the result when this code is run... and what I would Like to achieve!

在此处输入图像描述

How can I better optimize the curve_fit, so that instead of the code generated RED line, I get something closer to the BLUE drawn line?

Thank you!!

From your plot of data and expected fit, I would guess that you do not really want to model your data y as a logistic-like step function but log(y) as a logistic-like step function.

So, I think you would probably want to use a logistic step function, perhaps adding a linear component to model the log of this data. I would do this with lmfit , as it comes with the models built-in, gives better reporting of resulting, and allows you to greatly simplify your fitting code as with (disclaimer: I am a lead author):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

from lmfit.models import StepModel, LinearModel

# DATA
x = pd.Series([1, 1, 264, 882, 913, 1095, 1156, 1217, 1234, 1261, 1278,
              1460, 1490, 1490, 1521, 1578, 1612, 1612, 1668, 1702, 1704,
              1735, 1793, 2024, 2039, 2313, 2313, 2558, 2558, 2617, 2617,
              2708, 2739, 2770, 2770, 2831, 2861, 2892, 2892, 2892, 2892,
              2892, 2923, 2923, 2951, 2951, 2982, 2982, 3012, 3012, 3012,
              3012, 3012, 3012, 3012, 3073, 3073, 3073, 3104, 3104, 3104,
              3104, 3135, 3135, 3135, 3135, 3165, 3165, 3165, 3165, 3165,
              3196, 3196, 3196, 3226, 3226, 3257, 3316, 3347, 3347, 3347,
              3347, 3377, 3377, 3438, 3469, 3469]).values

y = pd.Series([1000, 600, 558.659217877095, 400, 300, 100, 7.75, 6, 8.54,
              6.66666666666667, 7.14, 1.1001100110011, 1.12, 0.89, 1, 2,
              0.666666666666667, 0.77, 1.12612612612613, 0.7,
              0.664010624169987, 0.65, 0.51, 0.445037828215398, 0.27, 0.1,
              0.26, 0.1, 0.1, 0.13, 0.16, 0.1, 0.13, 0.1, 0.12, 0.1, 0.13,
              0.14, 0.14, 0.17, 0.11, 0.15, 0.09, 0.1, 0.26, 0.16, 0.09,
              0.09, 0.05, 0.09, 0.09, 0.1, 0.1, 0.11, 0.11, 0.09, 0.09,
              0.11, 0.08, 0.09, 0.09, 0.1, 0.06, 0.07, 0.07, 0.09, 0.05,
              0.05, 0.06, 0.07, 0.08, 0.08, 0.07, 0.1, 0.08, 0.08, 0.05,
              0.06, 0.04, 0.04, 0.05, 0.05, 0.04, 0.06, 0.05, 0.05, 0.06]).values

model = StepModel(form='logistic') + LinearModel()
params = model.make_params(amplitude=-5, center=1000, sigma=100, intercept=0, slope=0)

result = model.fit(np.log(y), params, x=x)

print(result.fit_report())

plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, np.exp(result.best_fit), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
plt.show()

That will print out a report with fit statistics and best-fit values of:

[[Model]]
    (Model(step, form='logistic') + Model(linear))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 73
    # data points      = 87
    # variables        = 5
    chi-square         = 9.38961801
    reduced chi-square = 0.11450754
    Akaike info crit   = -183.688405
    Bayesian info crit = -171.358865
[[Variables]]
    amplitude: -4.89008796 +/- 0.29600969 (6.05%) (init = -5)
    center:     1180.65823 +/- 15.2836422 (1.29%) (init = 1000)
    sigma:      94.0317580 +/- 18.5328976 (19.71%) (init = 100)
    slope:     -0.00147861 +/- 8.1151e-05 (5.49%) (init = 0)
    intercept:  6.95177838 +/- 0.17170849 (2.47%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, slope)     = -0.798
    C(amplitude, sigma)     = -0.649
    C(amplitude, intercept) = -0.605
    C(center, intercept)    = -0.574
    C(sigma, slope)         =  0.542
    C(sigma, intercept)     =  0.348
    C(center, sigma)        = -0.335
    C(amplitude, center)    =  0.282

and produce a plot like this

在此处输入图像描述

You could certainly reproduce all that with scipy.optimize.curve_fit if you desired, but I would leave that as an exercise.

In your case I'd fit a hyperbolic tangent 1 to the base-10 logarithm of your data.

Let's use

log10 (y) = y₀ - a tanh (λ(x-x₀))

as your function

Approximately your x runs from 0 to 3500, your log10( y ) from 3 to -1, with the provision that tanh(2) = -tanh(2) ≈ 1 we have

y₀+a = 3, y0-a= -1 ⇒ y₀ = 1, a = 2;

λ = (2-(-2)) / (3500-0); x₀ = (3500-0)/2.

(this rough estimate is necessary to provede curve_fit with an initial guess, otherwise the procedure gets lost).

Omitting the boilerplate I have eventually

X = np.linspace(0, 3500, 701)
plt.scatter(x, np.log10(y), label='data')
plt.plot(X, 1-2*np.tanh(4/3500*(X-1750)), label='hand fit')
(y0, a, l, x0), *_ = curve_fit(
    lambda x, y0, a, l,x 0: y0 - a*np.tanh(l*(x-x0)),
    x, np.log10(y),
    p0=[1, 2, 4/3500, 3500/2])
plt.plot(X, y0-a*np.tanh(l*(X-x0)), label='curve_fit fit')
plt.legend()

在此处输入图像描述


Note 1: the logistic function is the hyperbolic tangent in disguise

I see that your plot uses log scaling, and I found that several different sigmoidal equations gave what appear to be good fits to the natural log of the Y data. Here is a graphical Python fitter using the natural log of the Y data with a four-parameter Logistic equation:

阴谋

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings


xData = numpy.array([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469], dtype=float)
yData = numpy.array([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06], dtype=float)

# fit the natural lpg of the data
yData = numpy.log(yData)

warnings.filterwarnings("ignore") # do not print "invalid value" warnings during fit
def func(x, a, b, c, d): # Four-Parameter Logistic from zunzun.com
    return d + (a - d) / (1.0 + numpy.power(x / c, b))


# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

print('Parameters:', fittedParameters)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Natural Log of Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess

Just to clarify, since this might be your issue, you shouldn't use "1.0" as your initial guess k. You should use 1.0 / (max(x) - min(x))

If your X's are data that ranges over say, [1200, 8000]. Then, using 1.0 will never converge. You want to use 1/6800, so you start off with a normalized [-1, 1] as your initial x-range.

The initial guess is indeed mandatory, and not using accurate values for all four parameters could cause convergence to fail. Usually, because the first evaluation will simply fail (np.exp(4000), isn't going to evaluate well).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM