[英]How to pickle individual steps in sklearn's Pipeline?
I am using Pipeline
from sklearn to classify text. 我正在使用sklearn中的
Pipeline
对文本进行分类。
In this example Pipeline
, I have a TfidfVectorizer
and some custom features wrapped with FeatureUnion
and a classifier as the Pipeline
steps, I then fit the training data and do the prediction: 在这个例子中
Pipeline
,我有一个TfidfVectorizer
和包裹着一些自定义功能FeatureUnion
和分类作为Pipeline
的步骤,那么我适合训练数据做预测:
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']
# classifier
LinearSVC1 = LinearSVC(tol=1e-4, C = 0.10000000000000001)
pipeline = Pipeline([
('features', FeatureUnion([
('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)),
('custom_features', CustomFeatures())])),
('clf', LinearSVC1),
])
pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)
# etc.
Here I need to pickle the TfidfVectorizer
step and leave the custom_features
unpickled, since I still do experiments with them. 在这里,我需要
TfidfVectorizer
步骤并保留custom_features
unpickled,因为我仍然使用它们进行实验。 The idea is to make the pipeline faster by pickling the tfidf step. 我们的想法是通过挑选tfidf步骤来加快管道流程。
I know I can pickle the whole Pipeline
with joblib.dump
, but how do I pickle individual steps? 我知道我可以使用
joblib.dump
来腌制整个Pipeline
,但我如何joblib.dump
个别步骤呢?
To pickle the TfidfVectorizer, you could use: 要挑选TfidfVectorizer,您可以使用:
joblib.dump(pipeline.steps[0][1].transformer_list[0][1], dump_path)
or: 要么:
joblib.dump(pipeline.get_params()['features__tfidf'], dump_path)
To load the dumped object, you can use: 要加载转储的对象,您可以使用:
pipeline.steps[0][1].transformer_list[0][1] = joblib.load(dump_path)
Unfortunately you can't use set_params
, the inverse of get_params
, to insert the estimator by name. 遗憾的是,您无法使用
set_params
( get_params
的反转)按名称插入估算器。 You will be able to if the changes in PR#1769: enable setting pipeline components as parameters are ever merged! 如果PR#1769中的更改:启用设置管道组件作为参数 ,您将能够合并!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.