简体   繁体   中英

Predicting intercept and coefficient for linear regression model for multiple variable

I have the following equation:

P = B0 + B1*Var1 + B2*Var2

I have the values of P , Var1 and Var2 with me. I tried to model this and then calculate the coefficients and intercept.

Below is the code and the output I am getting:

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power

l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]


X = []
for index in range(0,len(P)):
    row = []
    row.append(t[index])
    row.append(l[index])
    X.append(row)

print "Using statsmodels"
import statsmodels.api as sm

X = sm.add_constant(X)
est = sm.OLS(P, X).fit()

print est.params[0]
print est.params[1]
print est.params[2]

I am getting the results as:

Using statsmodels
1048.32518503
0.0102496334198
0.0860026475829

Is this correct? Does est.params[0] refers to B0 of the equation? I get B0 in the range of 600-650 when I run experiments?

Can this data mismatch because of wrong data ?

I am not familiar with statsmodels , but here is an implementation using curve_fit (see the code below). The reason for the mismatch of the model prediction and the experimental result you observe is in my opinion that your model ( B0 + B1*Var1 + B2*Var2 ) does not describe the data well (an exponential/log/sqrt would probably be better). In the next plots I show the original data, the fit obtained by curve_fit (code below) and the fit using your parameters.

在此处输入图片说明在此处输入图片说明

As you can see, the both fitting functions give the same result, however, your data should be modelled by another function, in my opinion. If I find time, I will look for function that fits your data better.

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]

# your model
def func(x, b0, b1, b2):

    var1, var2 = x

    return b0 + np.dot(b1, var1) + np.dot(b2, var2)

# Curve fit
coeff, _ = curve_fit(func, (l, t), P)
b0, b1, b2 = coeff[0], coeff[1], coeff[2]
print b0, b1, b2

# plot the data
xval = range(1 ,len(P)+1)
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l,t), *coeff), s=30, marker = "v", color="red", label='curvefit')
plt.legend(loc='upper left')
plt.figure()
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l, t), 1048.32518503, 0.0860026475829, 0.0102496334198 ), s=30, marker = "v",color="black",label='your parameter')
plt.legend(loc='upper left')
plt.show()
print "residuals curve_fit:",((P - func((l,t), *coeff))**2).sum()
print "residuals stats:",((P - func((l,t), 1048.32518503,0.086002647582,0.0102496334198))**2).sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM