简体   繁体   中英

How to fix .predict() function in statsmodels?

I'm trying to predict temperature at 12 UTC tomorrow in 1 location. To forecast, I use a basic linear regression model with the statmodels module. My code is hereafter:

x = ds_main
X = sm.add_constant(x)
y = ds_target_t
model = sm.OLS(y,X,missing='drop')
results = model.fit()

The summary shows that the fit is "good":

在此处输入图片说明

But the problem appears when I try to predict values with a new dataset that I consider to be my testset. The latter has the same columns number and the same variables names, but the .predict() function returns an array of NaN, although my testset has values ...

xnew = ts_main
Xnew = sm.add_constant(xnew)
ynewpred = results.predict(Xnew)

I really don't understand where the problem is ...

UPDATE : I think I have an explanation: my Xnew dataframe contains NaN values. Statmodels function .fit() allows to drop missing values (NaN) but not .predict() function. Thus, it returns a NaN values array ...

But this is the "why", but I still don't get the "how" reason to fix it...

statsmodels.api.OLS be default will not accept the data with NA values. So if you use this, then you need to drop your NA values first.

However, if you use statsmodels.formula.api.ols, then it will automatically drop the NA values to run regression and make predictions for you.

so you can try this:

import statsmodels.formula.api as smf
lm = smf.ols(formula = "y~X", pd.concat([y, X], axis = 1)).fit()
lm.predict(Xnew)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM