简体   繁体   English

如何在sklearn的管道中腌制个别步骤?

[英]How to pickle individual steps in sklearn's Pipeline?

I am using Pipeline from sklearn to classify text. 我正在使用sklearn中的Pipeline对文本进行分类。

In this example Pipeline , I have a TfidfVectorizer and some custom features wrapped with FeatureUnion and a classifier as the Pipeline steps, I then fit the training data and do the prediction: 在这个例子中Pipeline ,我有一个TfidfVectorizer和包裹着一些自定义功能FeatureUnion和分类作为Pipeline的步骤,那么我适合训练数据做预测:

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']

# classifier
LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)

pipeline = Pipeline([
    ('features', FeatureUnion([
       ('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)), 
       ('custom_features', CustomFeatures())])),
    ('clf', LinearSVC1),
    ])

pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)

# etc.

Here I need to pickle the TfidfVectorizer step and leave the custom_features unpickled, since I still do experiments with them. 在这里,我需要TfidfVectorizer步骤并保留custom_features unpickled,因为我仍然使用它们进行实验。 The idea is to make the pipeline faster by pickling the tfidf step. 我们的想法是通过挑选tfidf步骤来加快管道流程。

I know I can pickle the whole Pipeline with joblib.dump , but how do I pickle individual steps? 我知道我可以使用joblib.dump来腌制整个Pipeline ,但我如何joblib.dump个别步骤呢?

To pickle the TfidfVectorizer, you could use: 要挑选TfidfVectorizer,您可以使用:

joblib.dump(pipeline.steps[0][1].transformer_list[0][1], dump_path)

or: 要么:

joblib.dump(pipeline.get_params()['features__tfidf'], dump_path)

To load the dumped object, you can use: 要加载转储的对象,您可以使用:

pipeline.steps[0][1].transformer_list[0][1] = joblib.load(dump_path)

Unfortunately you can't use set_params , the inverse of get_params , to insert the estimator by name. 遗憾的是,您无法使用set_paramsget_params的反转)按名称插入估算器。 You will be able to if the changes in PR#1769: enable setting pipeline components as parameters are ever merged! 如果PR#1769中的更改:启用设置管道组件作为参数 ,您将能够合并!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何腌制 sklearn 管道 object? - How to pickle sklearn Pipeline object? 使用自定义变压器时如何正确腌制sklearn管道 - How to properly pickle sklearn pipeline when using custom transformer 如何使用固定步骤定义自定义 sklearn 管道? - How can I define a custom sklearn Pipeline with fixed steps? 将多个预处理步骤应用于 sklearn 管道中的列 - Apply multiple preprocessing steps to a column in sklearn pipeline Sklearn 如何使用 Joblib 或 Pickle 保存从管道和 GridSearchCV 创建的模型? - Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle? 如何为多标签分类器/一个vs其余分类器腌制sklearn管道? - How to pickle a sklearn pipeline for multi label classifier/one vs rest classifier? 是否可以为可选的 sklearn 管道步骤优化超参数? - Is it possible to optimize hyperparameters for optional sklearn pipeline steps? 自定义 sklearn 管道转换器给出“pickle.PicklingError” - Custom sklearn pipeline transformer giving "pickle.PicklingError" sklearn管道中的内存错误 - memory error in sklearn's pipeline 如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道? 类型错误:无法pickle _thread.RLock 对象 - How to log a sklearn pipeline with a Keras step using mlflow.pyfunc.log_model()? TypeError: can't pickle _thread.RLock objects
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM