[英]How to pickle a sklearn pipeline for multi label classifier/one vs rest classifier?
I am trying to create a multi-label classifier using the one vs rest classifier wrapper. 我正在尝试使用一个vs其余分类器包装器创建一个多标签分类器。
I used a pipeline for TFIDF and the classifier. 我将管道用于TFIDF和分类器。
When fitting the pipeline, I have to loop through my data by category and then fit the pipeline each time to make predictions for each category. 在拟合管道时,我必须按类别遍历数据,然后每次都对管道进行拟合以对每个类别进行预测。
Now, I want to export this like how one would usually export a fitted model using pickle or joblib. 现在,我要导出此图像,就像通常使用pickle或joblib导出拟合模型一样。
Example: 例:
pickle.dump(clf,'clf.pickle')
How can I do this with the pipeline? 如何使用管道执行此操作? Even if I pickle the pipeline, do I still need to fit the pipeline every time when I want to predict on a new keyword?
即使我腌制管线,但每次要预测新关键字时,是否仍需要调整管线?
Example: 例:
pickle.dump(pipeline,'pipeline.pickle')
pipeline = pickle.load('pipeline.pickle')
for category in categories:
pipeline.fit(X_train, y_train[category])
pipeline.predict(['kiwi'])
print (predict)
If I skip the pipeline.fit(X_train, y_train[category])
after loading the pipeline, I only get a single value array in predict. 如果我在加载管道后跳过
pipeline.fit(X_train, y_train[category])
,则我只会在预测中获得单个值数组。 If I fit the pipeline, I get a three value array. 如果我适合管道,则会得到一个三值数组。
Also, how can I incorporate the grid search into my pipeline for export? 另外,如何将网格搜索合并到导出管道中?
raw_data 原始数据
keyword class1 class2 class3
"orange apple" 1 0 1
"lime lemon" 1 0 0
"banana" 0 1 0
categories = ['class1','class2','class3']
pipeline 管道
SVC_pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words=stop_words)),
('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
])
Gridsearch (dont know how to incorporate this into the pipeline ) Gridsearch(不知道如何将其整合到管道中 )
parameters = {'tfidf__ngram_range': [(1, 1), (1, 2)],
'tfidf__use_idf': (True, False),
'tfidf__max_df': [0.25, 0.5, 0.75, 1.0],
'tfidf__max_features': [10, 50, 100, 250, 500, 1000, None],
'tfidf__stop_words': ('english', None),
'tfidf__smooth_idf': (True, False),
'tfidf__norm': ('l1', 'l2', None),
}
grid = GridSearchCV(SVC_pipeline, parameters, cv=2, verbose=1)
grid.fit(X_train, y_train)
Fitting pipeline 管件
for category in categories:
print('... Processing {}'.format(category))
SVC_pipeline.fit(X_train, y_train[category])
# compute the testing accuracy
prediction = SVC_pipeline.predict(X_test)
print('Test accuracy is {}'.format(accuracy_score(y_test[category], prediction)))
OneVsRestClassifier internally fits one classifier per class. OneVsRestClassifier在内部适合每个类一个分类器。 So you should not be fitting the pipeline for each class like you are doing in
因此,您不应像在
for category in categories:
pipeline.fit(X_train, y_train[category])
pipeline.predict(['kiwi'])
print (predict)
You should be doing something like this 你应该做这样的事情
SVC_pipeline = Pipeline([
('tfidf', TfidfVectorizer()), #add your stop_words
('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
])
SVC_pipeline.fit(["apple","boy","cat"],np.array([[0,1,1],[1,1,0],[1,1,1]]))
You can now save the model using 您现在可以使用保存模型
pickle.dump(SVC_pipeline,open('pipeline.pickle', 'wb'))
Later you can load back the model and make predictions using 之后,您可以使用以下方法加载模型并进行预测
obj = pickle.load(open('pipeline.pickle', 'rb'))
obj.predict(["apple","boy","cat"])
You can binarise your multiclass labels using MultiLabelBinarizer before passing them to fit method 您可以使用MultiLabelBinarizer对多类标签进行二值化处理,然后再将它们传递给合适的方法
Sample: 样品:
from sklearn.preprocessing import MultiLabelBinarizer
y = [['c1','c2'],['c3'],['c1'],['c1','c3'],['c1','c2','c3']]
mb = MultiLabelBinarizer()
y_encoded = mb.fit_transform(y)
SVC_pipeline.fit(["apple","boy","cat", "dog", "rat"], y_encoded)
grid = GridSearchCV(SVC_pipeline, {'tfidf__use_idf': (True, False)}, cv=2, verbose=1)
grid.fit(["apple","boy","cat", "dog", "rat"], y_encoded)
# Save the pipeline
pickle.dump(grid,open('grid.pickle', 'wb'))
# Later load it back and make predictions
grid_obj = pickle.load(open('grid.pickle', 'rb'))
grid_obj.predict(["apple","boy","cat", "dog", "rat"])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.