简体   繁体   中英

What is first value that is passed into StatsModels predict function?

I have the following OLS model from StatsModels:

X = df['Grade']
y = df['Results']

X = statsmodels.tools.tools.add_constant(X)

mod = sm.OLS(y,X)
results = mod.fit() 

When trying to predict a new Y value for an X value of 4, I have to pass the following:

results.predict([1,4])

I don't understand why an array with the first value being '1' needs to be passed in order for the predict function to work correctly. Why do I need to include a 1 instead of just saying:

results.predict([4])

I'm not clear on the concept at work here. Does anybody know what's going on?

You are adding a constant to the regression equation with X = statsmodels.tools.tools.add_constant(X) . So your regressor X has two columns where the first column is a array of ones.

You need to do the same with the regressor that is used in prediction. So, the 1 means include the constant in the prediction. If you use zero instead, then the contribution of the constant ( 0 * params[0] ) is zero and the prediction is only the slope effect.

The formula interface adds the constant automatically both for the regressor in the model and for the regressor in the prediction. However, with the pandas DataFrame or numpy ndarray interface, the constant needs to be added by the user both for the model and for predict.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM