I have built a classifier and I would like to save it for future use. The classifier includes different algorithms (logistic regression, naive bayes, svm):
X, y = tfidf(df, ngrams = 1)
X, y = under_sample.fit_resample(X, y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)
df_result = df_result.append(training_naive(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_logreg(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_svm(X_train, X_test, y_train, y_test), ignore_index = True)
this is the last step in my code, where I compare the different algorithm. training_svm/logreg and naive are functions. training_svm, for example, is defined as follows:
def training_svm(X_train_log, X_test_log, y_train_log, y_test_log):
folds = StratifiedKFold(n_splits = 3, shuffle = True, random_state = 40)
clf = svm.SVC(kernel='linear') # Linear Kernel
clf.fit(X_train_log, y_train_log)
res = pd.DataFrame(columns = ['Preprocessing', 'Model', 'Precision', 'Recall', 'F1-score', 'Accuracy'])
y_pred = clf.predict(X_test_log)
f1 = f1_score(y_pred, y_test_log, average = 'weighted')
pres = precision_score(y_pred, y_test_log, average = 'weighted')
rec = recall_score(y_pred, y_test_log, average = 'weighted')
acc = accuracy_score(y_pred, y_test_log)
res = res.append({'Model': f'SVM', 'Precision': pres,
'Recall': rec, 'F1-score': f1, 'Accuracy': acc}, ignore_index = True)
return res
Since I would like to use and test it with new data, I was wondering how to save it and re-use it. I would say I should do something like this
import pickle
# save
with open('model.pkl','wb') as f:
pickle.dump(clf,f)
# load
with open('model.pkl', 'rb') as f:
clf2 = pickle.load(f)
clf2.predict(X[0:1])
Can you please explain how to extend it to my project?
As stated by sklearn:
It is possible to save a model in scikit-learn by using Python's built-in persistence model, namely pickle
Example:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
X, y= datasets.load_iris(return_X_y=True)
clf.fit(X, y)
import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0:1])
Then you can include it in your code for every model, making a function called
def predict_svm(to_predict):
with open("'your_svm_model'",'rb') as f_input:
clf = pickle.loads(f_input) # maybe handled with a singleton to reduce loading for multiple predictions
return clf.predict(to_predict)
Anyway, sklearn suggest to use joblib
:
In the specific case of scikit-learn, it may be better to use joblib's replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:
from joblib import dump, load
dump(clf, 'filename.joblib')
clf = load('filename.joblib')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.