简体   繁体   中英

how to fit a function to data in python

I want to fit a function to the independant ( X ) and dependent ( y ) variables:

import numpy as np
y = np.array([1.45952016, 1.36947283, 1.31433227, 1.24076599, 1.20577963,
       1.14454815, 1.13068077, 1.09638278, 1.08121406, 1.04417094,
       1.02251471, 1.01268524, 0.98535659, 0.97400591])
X = np.array([4.571428571362048, 8.771428571548313, 12.404761904850602, 17.904761904850602,
            22.904761904850602, 31.238095237873495, 37.95833333302289, 
            44.67857142863795, 51.39880952378735, 64.83928571408615, 
            71.5595238097012, 85., 98.55357142863795, 112.1071428572759])

I already tried scipy package in this way:

from scipy.optimize import curve_fit
def func (x, a, b, c):
    return 1/(a*(x**2) + b*(x**1) + c)
g = [1, 1, 1]
c, cov = curve_fit (func, X.flatten(), y.flatten(), g)
test_ar = np.arange(min(X), max(X), 0.25)
pred = np.empty(len(test_ar))
for i in range (len(test_ar)):
    pred[i] = func(test_ar[i], c[0], c[1], c[2])

I can add higher orders of polynomial to make my func more accurate but I want to keep it simple. I very much appreciate if anyone an give me some help on how to find another function or make my prediction better. The figure also shows the result of the prediction:

在此处输入图像描述

First thing you want to do is to specify how do you measure "accuracy" which in your case is not an appropriate term at all.

What are you essentially doing is called linear regression. Suitable metrics in this case are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE). It is up to you to decide which metric to use and what threshold to set for being "acceptable".

The image that you are showing above (where you've fitted the line) looks fine BUT please expand your X-axis from -100 to 300 and show us the image again this is a problem with high degree polynomials .

This is a 101 example how to use regression in scikit-learn. In your case if you want to use x^2 or x^3 for predicting y, you just need to add them in to the data ... Currently your X variable is an array (a vector) you need to expand that to become a matrix where each column is a feature (x, x^2, x^3 ...)

here is some code:

import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

y = [1.45952016, 1.36947283, 1.31433227, 1.24076599, 
 1.20577963, 1.14454815, 1.13068077, 1.09638278, 
 1.08121406, 1.04417094, 1.02251471, 1.01268524, 0.98535659, 
 0.97400591]

x = [4.571428571362048, 8.771428571548313, 12.404761904850602, 
 17.904761904850602, 22.904761904850602, 31.238095237873495,
 37.95833333302289, 44.67857142863795, 51.39880952378735, 
 64.83928571408615, 71.5595238097012, 85., 98.55357142863795, 112.1071428572759]

df = pd.DataFrame({
    'x' : x,
    'x^2': [i**2 for i in x],
    'x^3': [i**3 for i in x],
    'y': y
})

X = df[['x','x^2','x^3']]
y = df['y']

model = linear_model.LinearRegression()
model.fit(X, y)
y1 = model.predict(X)

coef = model.coef_
intercept = model.intercept_

线性回归

you can see the coefficients from the coef variable:

array([-1.67456732e-02,  2.03899728e-04, -8.70976426e-07])

you can see the intercept from the intercept variable:

1.5042389677980577

which in your case means -> y1 = -1.67e-2x +2.03e-4x^2 -8.70e-7x^3 + 1.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM