简体   繁体   English

如何在sklearn中保存自定义变换器?

[英]How to save a custom transformer in sklearn?

I am not able to load an instance of a custom transformer saved using either sklearn.externals.joblib.dump or pickle.dump because the original definition of the custom transformer is missing from the current python session. 我无法加载使用sklearn.externals.joblib.dumppickle.dump保存的自定义转换器的实例,因为当前python会话中缺少自定义转换器的原始定义。

Suppose in one python session, I define, create and save a custom transformer, it can also be loaded in the same session: 假设在一个python会话中,我定义,创建并保存自定义转换器,它也可以在同一个会话中加载:

from sklearn.base import TransformerMixin
from sklearn.base import BaseEstimator
from sklearn.externals import joblib

class CustomTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        return X


custom_transformer = CustomTransformer()    
joblib.dump(custom_transformer, 'custom_transformer.pkl')

loaded_custom_transformer = joblib.load('custom_transformer.pkl')

Opening up a new python session and loading from 'custom_transformer.pkl' 打开一个新的python会话并从'custom_transformer.pkl'加载

from sklearn.externals import joblib

joblib.load('custom_transformer.pkl')

raises the following exception: 引发以下异常:

AttributeError: module '__main__' has no attribute 'CustomTransformer'

The same thing is observed if joblib is replaced with pickle . 如果将joblib替换为pickle则会观察到同样的情况。 Saving the custom transformer in one session with 使用自定义转换器保存在一个会话中

with open('custom_transformer_pickle.pkl', 'wb') as f:
    pickle.dump(custom_transformer, f, -1)

and loading it in another: 并将其加载到另一个:

with open('custom_transformer_pickle.pkl', 'rb') as f:
    loaded_custom_transformer_pickle = pickle.load(f)

raises the same exception. 提出了同样的例外。

In the above, if CustomTransformer is replaced with, say, sklearn.preprocessing.StandardScaler , then it is found that the saved instance can be loaded in a new python session. 在上面,如果用例如CustomTransformer替换sklearn.preprocessing.StandardScaler ,则会发现保存的实例可以在新的python会话中加载。

Is it possible to be able to save a custom transformer and load it later somewhere else? 是否有可能保存自定义变压器并在以后的其他地方加载?

sklearn.preprocessing.StandardScaler works because the class definition is available in the sklearn package installation, which joblib will look up when you load the pickle. sklearn.preprocessing.StandardScaler可以工作,因为sklearn软件包安装中提供了类定义,当加载pickle时, joblib会查找joblib

You'll have to make your CustomTransformer class available in the new session, either by re-defining or importing it. 您必须在新会话中使用CustomTransformer类,方法是重新定义或导入它。

It works for me if I pass my transform function in sklearn.preprocessing.FunctionTranformer() and if I save the model using dill.dump() and dill.load a ".pk" file. 如果我在sklearn.preprocessing.FunctionTranformer()传递我的转换函数,并且如果我使用dill.dump()dill.load一个“.pk”文件保存模型, dill.dump()dill.dump()

Note: I have included the tranform function into a sklearn pipeline with my classifier. 注意:我已使用我的分类器将tranform函数包含到sklearn管道中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM