如何腌制 sklearn 管道 object？

Question

我正在嘗試保存管道。 我不能。 這是我的 class object，我已經嘗試過酸洗。

class SentimentModel():

    def __init__(self,model_instance,x_train,x_test,y_train,y_test):
        import string
        from nltk import ngrams
        self.ngrams = ngrams
        self.string = string
        self.model = model_instance
        self.x_train = x_train
        self.x_test = x_test
        self.y_train = y_train
        self.y_test = y_test       
        self._fit()


    def _fit(self):
        from sklearn.pipeline import Pipeline
        from sklearn.feature_extraction.text import TfidfTransformer
        from sklearn.feature_extraction.text import CountVectorizer      

        self.pipeline = Pipeline([
            ('bow', CountVectorizer(analyzer=self._text_process)), 
            ('tfidf', TfidfTransformer()), 
            ('classifier', self.model), 
        ])
        self.pipeline.fit(self.x_train,self.y_train)
        self.preds = self.pipeline.predict(self.x_test)     

    def _text_process(self,text):
        def remove_non_ascii(text):
            return ''.join(i for i in text if ord(i)<128)        

        text = remove_non_ascii(text)
        text = [char.lower() for char in text if char not in self.string.punctuation]
        text = ''.join(text)
        unigrams = [word for word in text.split()]
        bigrams = [' '.join(g) for g in self.ngrams(unigrams,2)]
        trigrams = [' '.join(g) for g in self.ngrams(unigrams,3)]
        tokens = []
        tokens.extend(unigrams+bigrams+trigrams)
        return tokens        

    def predict(self,observation):
        return self.pipeline.predict(observation)

我得到這些錯誤：

from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb_model = SentimentModel(nb,X_train,X_test,y_train,y_test)

import pickle
with open('nb_model1.pkl','wb') as f:
    pickle.dump(nb_model,f)

>>>
TypeError: can't pickle module objects

同樣地：

with open('nb_model1.pkl','wb') as f:
    pickle.dump(nb_model.pipeline,f)

TypeError: can't pickle module objects

但是，我可以保存nb_model.model 。 但不是管道 object。這是什么解釋？ 如何讓我的整個管道持續存在？

我已經看到如何腌制 sklearn 管道中的各個步驟？ ，但問題是，它不能 pickle bow屬性。

joblib.dump(nb_model.pipeline.get_params()['tfidf'], 'nb_tfidf.pkl') # pass
joblib.dump(nb_model.pipeline.get_params()['bow'], 'nb_bow.pkl') # fail
joblib.dump(nb_model.pipeline.get_params()['classifier'], 'nb_classifier.pkl') #pass

>>>
TypeError: can't pickle module objects

我應該怎么辦？

Answer 1

再試一次，不要在 class 定義中導入模塊。 這不是一個好的做法，因為當你導入諸如import string之類的東西時，你將一整套第三方代碼帶到你的代碼中，而這些代碼甚至可能沒有安裝在另一台想要使用這個 pickle 的機器上； 這不是一個好習慣。 也許pickle是在保護你做這種事。

如何腌制 sklearn 管道 object？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-05-16 02:10:03

如何腌制 sklearn 管道 object？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-05-16 02:10:03

解決方案1
2 已采納 2020-05-16 02:10:03