简体   繁体   中英

Meaning of arguments passed to statsmodels OLS.predict

I'm using the Old Faithful Geyser Dataset to learn some introductory linear regression and prediction. The dataset contains two features; time since last eruption and the duration of the subsequent eruption:

    eruptions   waiting
0   3.600   79
1   1.800   54
2   3.333   74
3   2.283   62
4   4.533   85

Here is the summary data:

旧的忠实数据集摘要

Here's the model:

import statsmodels.api as sm

X = faithful.waiting
X = sm.add_constant(X)
y = faithful.eruptions

model = sm.OLS(y, X)
results = model.fit()

My question arises when trying to make a prediction using predict() . For example, If I have just waited 75 minutes, how long will this eruption last?

results.predict([1, 75]) # 1 needs to be passed, I don't know why

Why pass 1? Is it because the add_constant(X) added 1?:

    const   waiting
0   1.0   79
1   1.0   54
2   1.0   74
3   1.0   62
4   1.0   85

The docs say that the main argument is "Parameters of a linear model" The only variable is wait time (75mins) and the regression line has it's own intercept anyway (-1.87):

results.params
>>>
const     -1.874016
waiting    0.075628
dtype: float64

This answer notes that:

model.predict doesn't know about the parameters, and requires them in the call

However if that's the case would -1.87 not be the best argument?

Any help greatly appreciated thanks.

You are learning parameters for an equation of the form y = Ax + b Your model says that A = 0.075628 and b = -1.87 (b is the bias term) You pass two values x = 75 and the multiplier for b is 1.

You can also run predict as, results.predict(sm.add_constant([75], has_constant='add'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM