简体   繁体   中英

Test and Validation in statsmodel package of python

I have been able to do the model predict using result = logit.fit() .

Now for testing and validation set shall we just do result.predict(test_df[features]) and result.predict(vald_df[features]) ? Is that all? Or am I missing some step? How it would be different when I try to deploy the model for daily prediction ?

I am new to statsmodel, in fact started today and kind of short of time. I checked a few blogs, information are disjointed, so just wanted to be sure.

Also, is there a way we can directly extract 'Area under ROC' from statsmodel rather than coding our way through?

For the first question, each ML algorithm (trees, logistic regression, ...) has parameters. to find best parameters for un algorithm, we train multiple models with different parameters and we keep the model(parameter) that gives the best score on the validation data set. Now this score does not give you an idea of what score will give you once in production(prediction) for that you test your model with the best parameter on the test dataset and this final score gives you an idea of how your model will perform on production.

For the second question, you can use skit-learn , i google and i found thoses examples http://www.programcreek.com/python/example/82598/sklearn.metrics.auc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM