I want to save to disk an sklearn Pipeline including a custom Preprocessing and a RandomForestClassifier with all the dependencies inside the saved file.. Without this feature, I have to copy all the dependencies (custom modules) in the same folder everywhere I want to call this model (in my case on a remote server).
The preprocessor is defined in a class which lies in an other file ( preprocessing.py ) in the same folder of my project. So I get access to it through an import .
training.py
from preprocessing import Preprocessor
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
import pickle
clf = Pipeline([
("preprocessing", Preprocessor()),
("model", RandomForestClassifier())
])
# some fitting of the classifier
# ...
# Export
with open(savepath, "wb") as handle:
pickle.dump(clf, handle, protocol=pickle.HIGHEST_PROTOCOL)
I tried pickle (and some of its variations), dill and joblib, but that did not work. When I import the .pkl somewhere else (say on my remote server). I must have an identical preprocessing.py in the architecture... which is a pain.
What I would love is to have another file somewhere else :
remote.py
import pickle
with open(savepath, "rb") as handle:
model = pickle.load(handle)
print(model.predict(some_matrix))
But this code currently gives me an error as it does not find the Preprocessor class...
I'm facing an identical issue right now. To address the same, I am going to turn my pipeline/model along with all it's dependencies(preprocessing classes) into a Python module using setup tools so that it is self contained and can be run anywhere (remote server/docker container/VM.
I'm currently going through this process and if this is something you are interested in, I can respond with the additional steps spelled out as I make progress.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.