I can't seem to figure out the syntax to score a logistic regression model.
logit = sm.Logit(data[response],sm.add_constant(data[features]))
model = logit.fit()
preds = model.predict(data[features])
This is the traceback I am getting (sorry for the ugly format, didn't know how to fix it...)
2 logit = sm.Logit(data[response],sm.add_constant(data[features]))
3 model = logit.fit()
----> 4 preds = model.predict(data[features])
878 exog = dmatrix(self.model.data.orig_exog.design_info.builder,
879 exog)
--> 880 return self.model.predict(self.params, exog, *args, **kwargs) 881 882
376 exog = self.exog
377 if not linear:
--> 378 return self.cdf(np.dot(exog, params)) 379 else: 380 return np.dot(exog, params)
ValueError: matrices are not aligned
You are including the constant in the estimation but not in the prediction.
The explanatory variable use for prediction needs the same number of variables, including a constant if it was used in the estimation:
preds = model.predict(sm.add_constant(data[features]))
It is often useful to add a constant column to the data frame so we have a consistent set of variables including the constant.
Related: The formula interface does some automatic transformations also in the call to predict, if they have been used in the model.
It looks like you also need to add the constant to the predict method. Assuming you're working with pandas, it might be easier to do
data['constant'] = 1
And add it to your features list. Alternatively you can use the formula interface at statsmodels.formula.api.logit
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.