简体   繁体   中英

Score Statsmodels Logit

I can't seem to figure out the syntax to score a logistic regression model.

logit = sm.Logit(data[response],sm.add_constant(data[features]))
model = logit.fit()
preds = model.predict(data[features])

This is the traceback I am getting (sorry for the ugly format, didn't know how to fix it...)


  2     logit = sm.Logit(data[response],sm.add_constant(data[features]))
  3     model = logit.fit()

----> 4 preds = model.predict(data[features])

878             exog = dmatrix(self.model.data.orig_exog.design_info.builder,
879                     exog)

--> 880 return self.model.predict(self.params, exog, *args, **kwargs) 881 882

376             exog = self.exog
377         if not linear:

--> 378 return self.cdf(np.dot(exog, params)) 379 else: 380 return np.dot(exog, params)

ValueError: matrices are not aligned

You are including the constant in the estimation but not in the prediction.

The explanatory variable use for prediction needs the same number of variables, including a constant if it was used in the estimation:

preds = model.predict(sm.add_constant(data[features]))

It is often useful to add a constant column to the data frame so we have a consistent set of variables including the constant.

Related: The formula interface does some automatic transformations also in the call to predict, if they have been used in the model.

It looks like you also need to add the constant to the predict method. Assuming you're working with pandas, it might be easier to do

data['constant'] = 1

And add it to your features list. Alternatively you can use the formula interface at statsmodels.formula.api.logit

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM