如何在sklearn中保存自定义变换器？

Question

I am not able to load an instance of a custom transformer saved using either sklearn.externals.joblib.dump or pickle.dump because the original definition of the custom transformer is missing from the current python session. 我无法加载使用sklearn.externals.joblib.dump或pickle.dump保存的自定义转换器的实例，因为当前python会话中缺少自定义转换器的原始定义。

Suppose in one python session, I define, create and save a custom transformer, it can also be loaded in the same session: 假设在一个python会话中，我定义，创建并保存自定义转换器，它也可以在同一个会话中加载：

from sklearn.base import TransformerMixin
from sklearn.base import BaseEstimator
from sklearn.externals import joblib

class CustomTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        return X


custom_transformer = CustomTransformer()    
joblib.dump(custom_transformer, 'custom_transformer.pkl')

loaded_custom_transformer = joblib.load('custom_transformer.pkl')

Opening up a new python session and loading from 'custom_transformer.pkl' 打开一个新的python会话并从'custom_transformer.pkl'加载

from sklearn.externals import joblib

joblib.load('custom_transformer.pkl')

raises the following exception: 引发以下异常：

AttributeError: module '__main__' has no attribute 'CustomTransformer'

The same thing is observed if joblib is replaced with pickle . 如果将joblib替换为pickle则会观察到同样的情况。 Saving the custom transformer in one session with 使用自定义转换器保存在一个会话中

with open('custom_transformer_pickle.pkl', 'wb') as f:
    pickle.dump(custom_transformer, f, -1)

and loading it in another: 并将其加载到另一个：

with open('custom_transformer_pickle.pkl', 'rb') as f:
    loaded_custom_transformer_pickle = pickle.load(f)

raises the same exception. 提出了同样的例外。

In the above, if CustomTransformer is replaced with, say, sklearn.preprocessing.StandardScaler , then it is found that the saved instance can be loaded in a new python session. 在上面，如果用例如CustomTransformer替换sklearn.preprocessing.StandardScaler ，则会发现保存的实例可以在新的python会话中加载。

Is it possible to be able to save a custom transformer and load it later somewhere else? 是否有可能保存自定义变压器并在以后的其他地方加载？

Answer 1

sklearn.preprocessing.StandardScaler works because the class definition is available in the sklearn package installation, which joblib will look up when you load the pickle. sklearn.preprocessing.StandardScaler可以工作，因为sklearn软件包安装中提供了类定义，当加载pickle时， joblib会查找joblib 。

You'll have to make your CustomTransformer class available in the new session, either by re-defining or importing it. 您必须在新会话中使用CustomTransformer类，方法是重新定义或导入它。

Answer 2

It works for me if I pass my transform function in sklearn.preprocessing.FunctionTranformer() and if I save the model using dill.dump() and dill.load a ".pk" file. 如果我在sklearn.preprocessing.FunctionTranformer()传递我的转换函数，并且如果我使用dill.dump()和dill.load一个“.pk”文件保存模型， dill.dump()我dill.dump() 。

Note: I have included the tranform function into a sklearn pipeline with my classifier. 注意：我已使用我的分类器将tranform函数包含到sklearn管道中。

如何在sklearn中保存自定义变换器？

问题描述

2 个解决方案

解决方案1
4 2017-09-06 14:36:19

解决方案2
0 2019-03-18 14:43:17

如何在sklearn中保存自定义变换器？

问题描述

2 个解决方案

解决方案1 4 2017-09-06 14:36:19

解决方案2 0 2019-03-18 14:43:17

解决方案1
4 2017-09-06 14:36:19

解决方案2
0 2019-03-18 14:43:17