简体   繁体   English

如何腌制 sklearn 管道 object?

[英]How to pickle sklearn Pipeline object?

I'm trying to save a pipeline.我正在尝试保存管道。 I can't.我不能。 Here's my class object, which I've tried pickling.这是我的 class object,我已经尝试过酸洗。

class SentimentModel():

    def __init__(self,model_instance,x_train,x_test,y_train,y_test):
        import string
        from nltk import ngrams
        self.ngrams = ngrams
        self.string = string
        self.model = model_instance
        self.x_train = x_train
        self.x_test = x_test
        self.y_train = y_train
        self.y_test = y_test       
        self._fit()


    def _fit(self):
        from sklearn.pipeline import Pipeline
        from sklearn.feature_extraction.text import TfidfTransformer
        from sklearn.feature_extraction.text import CountVectorizer      

        self.pipeline = Pipeline([
            ('bow', CountVectorizer(analyzer=self._text_process)), 
            ('tfidf', TfidfTransformer()), 
            ('classifier', self.model), 
        ])
        self.pipeline.fit(self.x_train,self.y_train)
        self.preds = self.pipeline.predict(self.x_test)     

    def _text_process(self,text):
        def remove_non_ascii(text):
            return ''.join(i for i in text if ord(i)<128)        

        text = remove_non_ascii(text)
        text = [char.lower() for char in text if char not in self.string.punctuation]
        text = ''.join(text)
        unigrams = [word for word in text.split()]
        bigrams = [' '.join(g) for g in self.ngrams(unigrams,2)]
        trigrams = [' '.join(g) for g in self.ngrams(unigrams,3)]
        tokens = []
        tokens.extend(unigrams+bigrams+trigrams)
        return tokens        

    def predict(self,observation):
        return self.pipeline.predict(observation)

And I get these errors:我得到这些错误:

from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb_model = SentimentModel(nb,X_train,X_test,y_train,y_test)

import pickle
with open('nb_model1.pkl','wb') as f:
    pickle.dump(nb_model,f)

>>>
TypeError: can't pickle module objects

Likewise:同样地:

with open('nb_model1.pkl','wb') as f:
    pickle.dump(nb_model.pipeline,f)

TypeError: can't pickle module objects

I can however, save nb_model.model .但是,我可以保存nb_model.model But not the pipeline object. What's the explanation?但不是管道 object。这是什么解释? How do I make my whole pipeline persist?如何让我的整个管道持续存在?

I've seen How to pickle individual steps in sklearn's Pipeline?我已经看到如何腌制 sklearn 管道中的各个步骤? , but the problem is, it can't pickle the bow attribute. ,但问题是,它不能 pickle bow属性。

joblib.dump(nb_model.pipeline.get_params()['tfidf'], 'nb_tfidf.pkl') # pass
joblib.dump(nb_model.pipeline.get_params()['bow'], 'nb_bow.pkl') # fail
joblib.dump(nb_model.pipeline.get_params()['classifier'], 'nb_classifier.pkl') #pass

>>>
TypeError: can't pickle module objects

What should I do?我应该怎么办?

Try it again without importing modules inside your class definition.再试一次,不要在 class 定义中导入模块。 It's not a good practice because when you import something such as import string , you bring a whole set of third-party code to your code that may even not be installed on another other machine that wants to use this pickle;这不是一个好的做法,因为当你导入诸如import string之类的东西时,你将一整套第三方代码带到你的代码中,而这些代码甚至可能没有安装在另一台想要使用这个 pickle 的机器上; it's not a good practice.这不是一个好习惯。 Maybe pickle is protecting you to do this kind of thing.也许pickle是在保护你做这种事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在sklearn的管道中腌制个别步骤? - How to pickle individual steps in sklearn's Pipeline? 使用自定义变压器时如何正确腌制sklearn管道 - How to properly pickle sklearn pipeline when using custom transformer 如何从 sklearn 管道 output Pandas object - How to output Pandas object from sklearn pipeline Sklearn 如何使用 Joblib 或 Pickle 保存从管道和 GridSearchCV 创建的模型? - Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle? 如何为多标签分类器/一个vs其余分类器腌制sklearn管道? - How to pickle a sklearn pipeline for multi label classifier/one vs rest classifier? 自定义 sklearn 管道转换器给出“pickle.PicklingError” - Custom sklearn pipeline transformer giving "pickle.PicklingError" 如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道? 类型错误:无法pickle _thread.RLock 对象 - How to log a sklearn pipeline with a Keras step using mlflow.pyfunc.log_model()? TypeError: can't pickle _thread.RLock objects 如何在所有其他请求中使用在第一个请求中创建的sklearn泡菜对象 - How to use sklearn pickle object created in first request across all other requests 如何使用pickle保存sklearn模型 - How to use the pickle to save sklearn model 如何将 sklearn 管道转换为 pyspark 管道? - How to convert a sklearn pipeline into a pyspark pipeline?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM