Predicting intercept and coefficient for linear regression model for multiple variable

Question

I have the following equation:

P = B0 + B1*Var1 + B2*Var2

I have the values of P , Var1 and Var2 with me. I tried to model this and then calculate the coefficients and intercept.

Below is the code and the output I am getting:

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power

l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]


X = []
for index in range(0,len(P)):
    row = []
    row.append(t[index])
    row.append(l[index])
    X.append(row)

print "Using statsmodels"
import statsmodels.api as sm

X = sm.add_constant(X)
est = sm.OLS(P, X).fit()

print est.params[0]
print est.params[1]
print est.params[2]

I am getting the results as:

Using statsmodels
1048.32518503
0.0102496334198
0.0860026475829

Is this correct? Does est.params[0] refers to B0 of the equation? I get B0 in the range of 600-650 when I run experiments?

Can this data mismatch because of wrong data ?

Answer 1

I am not familiar with statsmodels , but here is an implementation using curve_fit (see the code below). The reason for the mismatch of the model prediction and the experimental result you observe is in my opinion that your model ( B0 + B1*Var1 + B2*Var2 ) does not describe the data well (an exponential/log/sqrt would probably be better). In the next plots I show the original data, the fit obtained by curve_fit (code below) and the fit using your parameters.

在此处输入图片说明

As you can see, the both fitting functions give the same result, however, your data should be modelled by another function, in my opinion. If I find time, I will look for function that fits your data better.

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]

# your model
def func(x, b0, b1, b2):

    var1, var2 = x

    return b0 + np.dot(b1, var1) + np.dot(b2, var2)

# Curve fit
coeff, _ = curve_fit(func, (l, t), P)
b0, b1, b2 = coeff[0], coeff[1], coeff[2]
print b0, b1, b2

# plot the data
xval = range(1 ,len(P)+1)
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l,t), *coeff), s=30, marker = "v", color="red", label='curvefit')
plt.legend(loc='upper left')
plt.figure()
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l, t), 1048.32518503, 0.0860026475829, 0.0102496334198 ), s=30, marker = "v",color="black",label='your parameter')
plt.legend(loc='upper left')
plt.show()
print "residuals curve_fit:",((P - func((l,t), *coeff))**2).sum()
print "residuals stats:",((P - func((l,t), 1048.32518503,0.086002647582,0.0102496334198))**2).sum()

Predicting intercept and coefficient for linear regression model for multiple variable

Question

1 answers

solution1
0 ACCPTED 2015-06-25 20:08:42

Predicting intercept and coefficient for linear regression model for multiple variable

Question

1 answers

solution1 0 ACCPTED 2015-06-25 20:08:42

solution1
0 ACCPTED 2015-06-25 20:08:42