简体   繁体   中英

multiprocessing in Logistic Regression in Python

I am using LogisticRegression algorithm

it works fine, except it is taking long time to finish

I decided to use multiprocessing feature (n_jobs=-1) as per https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

but no change in the performance

Here is my code

mdl = LogisticRegression(n_jobs=-1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
mdl.fit(X_train,y_train)
y_pred=mdl.predict(X_test)

How can I use it on LogisticRegression?

Are you doing multiclass classification? If your data does not have more than two classes, setting the n_jobs argument is virtually useless.

To improve speed try feature engineering to reduce the number of features.

You could also try changing the solver. Here's what the documentation says:
"For small datasets, 'liblinear' (used to be the default) is a good choice, whereas 'sag' and 'saga' are faster for large ones. For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes."

There are also some parameters like tol you could try changing.

Finally, if nothing works, use another model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM