Python multiprocessing code running infinitely

Question

I am trying to train 2 models concurrently using sklearn and python's built-in multiprocessing library.

def train_model(model, X, y):
    model.fit(X, y)
    return model

from multiprocessing import Process

p1 = Process(target = train_model, args = (dt, X_train, y_train))
p2 = Process(target = train_model, args = (lr, X_train, y_train))

p1.start()
p2.start()

p1.join()
p2.join()

However, upon running this piece of code it continues to run infinitely. Training the two models individually doesn't take longer than a few seconds.

If my approach is wrong, how do I train 2 models parallelly?

Edit: Python version is 3.8.0. I am running this code on Jupyter Notebook on Windows 10.

Edit 2: The problem seems to lie with Jupyter Notebook. The same code runs without any problem on Google Colab.

Edit 3: I am now trying to run this code using my terminal

dt = DecisionTreeClassifier(class_weight='balanced')
lr = LogisticRegression(class_weight='balanced')


def train_model(model, X, y):
    model.fit(X, y)
    return model


p1 = Process(target=train_model, args=(dt, X_train, y_train))
p2 = Process(target=train_model, args=((lr, X_train, y_train)))

if __name__ == '__main__':
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    dt_pred = dt.predict(X_test)
    lr_pred = lr.predict(X_test)

    print("Classification report for Decision Tree:",classification_report(y_test,dt_pred))
    print("Classification report for Logistic Regression", classification_report(y_test, lr_pred))

and get the following error

Traceback (most recent call last):
  File "D:/Bennett/HPC/E19CSE058_Lab3/E19CSE058_Lab3_Pt2.py", line 33, in <module>
    dt_pred = dt.predict(X_test)
  File "E:\Anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 436, in predict
    check_is_fitted(self)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 1041, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

It seems the training done through multiprocessing isn't being reflected outside the processes. How do I counter this?

Answer 1

Aaron has the right answer. On Windows, each process starts running your script over from the beginning, which will launch two more processes, each of which launches two more processes, etc. Anything that must be run ONLY in the master process needs to be protected by the "__main__" test:

from multiprocessing import Process

def train_model(model, X, y):
    model.fit(X, y)
    return model

def main():
    p1 = Process(target = train_model, args = (dt, X_train, y_train))
    p2 = Process(target = train_model, args = (lr, X_train, y_train))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

if __name__ == "__main__":
    main()

Python multiprocessing code running infinitely

Question

1 answers

solution1
2 2022-02-22 04:40:17

Python multiprocessing code running infinitely

Question

1 answers

solution1 2 2022-02-22 04:40:17

solution1
2 2022-02-22 04:40:17