简体   繁体   中英

Train a logistic regression model in parts for big data

My data set consists of 1.6 million rows and 17000 columns after preprocessing. I want to use logistic regression on this data, however the process gets killed everytime I load the dataset. Is there a way I can train a logistic regression model in chunks, wit the coefficients being updated at each iteration. Does sklearn support any technique for my problem?

first, please read this . the time to train a LR on your data set is.... a bit high. to avoid that, you can use the warm start param of LR in sklearn and loop over chunck of your datas.

warm_start : bool, default: False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary.

(from here )

and to be more precise:

warm_start When fitting an estimator repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance as in grid search), it may be possible to reuse aspects of the model learnt from the previous parameter value, saving time. When warm_start is true, the existing fitted model attributes an are used to initialise the new model in a subsequent call to fit .

(from here )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM