简体   繁体   English

使用自定义变压器部署 sklearn model

[英]Deploy sklearn model with custom transformer

I have a sklearn pipeline that has been defined in the following way:我有一个已按以下方式定义的 sklearn 管道:

from tools.transformers import MyTransformer

...

pipe = Pipeline([
    ('mytransformer', MyTransformer()),
    ('lm', LinearRegression())
])

...

The structure of my code is我的代码结构是

src
├── __init__.py
├── train.py
└── tools
    └── transformers.py

I have trained my model and my pipeline is saved in a .joblib file.我已经训练了我的 model 并且我的管道保存在.joblib文件中。 Now I want to use my model in another project.现在我想在另一个项目中使用我的 model。 However, I need to move not only the .joblib file, but the whole tools/transformers.py structure.但是,我不仅需要移动.joblib文件,还需要移动整个tools/transformers.py结构。 I think this is kind of difficult to maintain and hard to understand.我认为这有点难以维护和理解。

Is there an easier way to make the pipeline work without the need of moving the code around with the exact same structure?有没有一种更简单的方法可以使管道工作而无需使用完全相同的结构移动代码?

You need to create a separate project, for instance, internal_lib , and move there all custom logic that you use in the different projects.您需要创建一个单独的项目,例如internal_lib ,并将您在不同项目中使用的所有自定义逻辑移到那里。 Then, you need to install your internal_lib as a part of your python environment (via pip or conda).然后,您需要将internal_lib安装为 python 环境的一部分(通过 pip 或 conda)。 After, you will be able to pickle a trained pipeline and reuse it in another project.之后,您将能够腌制经过训练的管道并在另一个项目中重用它。

Technically it can be implemented as a private github repo and installed via pip.从技术上讲,它可以实现为私有 github 存储库并通过 pip 安装。 Here are couple of the links on how to implement: one , two .这里有几个关于如何实现的链接:

You should be able to use cloudpickle to ensure your custom module (transformer.py) is also loaded when loading the pickle file.您应该能够使用cloudpickle来确保在加载 pickle 文件时也加载您的自定义模块 (transformer.py)。

import cloudpickle

cloudpickle.register_pickle_by_value(MyTransformer)
with open('./Pipe.cloudpkl', mode='wb') as file:
    cloudpickle.dump(
        obj=Pipe
        , file=file
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM