I'm using the Old Faithful Geyser Dataset to learn some introductory linear regression and prediction. The dataset contains two features; time since last eruption and the duration of the subsequent eruption:
eruptions waiting
0 3.600 79
1 1.800 54
2 3.333 74
3 2.283 62
4 4.533 85
Here is the summary data:
Here's the model:
import statsmodels.api as sm
X = faithful.waiting
X = sm.add_constant(X)
y = faithful.eruptions
model = sm.OLS(y, X)
results = model.fit()
My question arises when trying to make a prediction using predict()
. For example, If I have just waited 75 minutes, how long will this eruption last?
results.predict([1, 75]) # 1 needs to be passed, I don't know why
Why pass 1? Is it because the add_constant(X)
added 1?:
const waiting
0 1.0 79
1 1.0 54
2 1.0 74
3 1.0 62
4 1.0 85
The docs say that the main argument is "Parameters of a linear model" The only variable is wait time (75mins) and the regression line has it's own intercept anyway (-1.87):
results.params
>>>
const -1.874016
waiting 0.075628
dtype: float64
This answer notes that:
model.predict
doesn't know about the parameters, and requires them in the call
However if that's the case would -1.87 not be the best argument?
Any help greatly appreciated thanks.
You are learning parameters for an equation of the form y = Ax + b
Your model says that A = 0.075628
and b = -1.87
(b is the bias term) You pass two values x = 75 and the multiplier for b is 1.
You can also run predict as, results.predict(sm.add_constant([75], has_constant='add'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.