简体   繁体   English

有没有办法腌制一个快速文本模型/对象?

[英]Is there a way to pickle a fasttext model/object?

I just trained my fasttext model and I am trying to pin it using pins https://pypi.org/project/pins/ and vetiver https://pypi.org/project/vetiver/ for version control.我刚刚训练了我的fasttext model 并且我正在尝试使用引脚https://pypi.org/project/pins/香根草Z5E056C500A1C4B6A7110B50D80A1C4B6A7110B50D807BADE5Z://pypi.org/project/pins/和控制版本的香根草 Z5E056C500A1C4B6A7110B50D80A1C4B6A7BADE5Z///pi.orgproject
However, for that to happen I need to pickle the fasttext object/model.但是,要做到这一点,我需要腌制fasttext 对象/模型。 And that is where I am struggling.这就是我挣扎的地方。
PS: When I save the fasttext model to disk, it saves as a .bin or a binary file. PS:当我将 fasttext model 保存到磁盘时,它保存为.bin二进制文件。 Here is how the code looks, when using pins :这是使用引脚时代码的外观:

import pins
import fasttext
board = pins.board_temp(allow_pickle_read = True)
board.pin_write(ft_model, "ft_model", type="joblib")  #ft_model is a fasttext model I already trained

The error code I get for running these ^ lines is: cannot pickle 'fasttext_pybind.fasttext' object我运行这些 ^ 行得到的错误代码是: cannot pickle 'fasttext_pybind.fasttext' object

The same happens when I use vetiver :当我使用香根草时也会发生同样的情况:

import vetiver
import fasttext
import pins

class FasttextHandler(BaseHandler):
    def __init__(self, model, ptype_data):
        super().__init__(model, ptype_data)

handled_model = FasttextHandler(model = ft_model, ptype_data = None )
vetiver_fasttext_model = vetiver.VetiverModel(model = handled_model, model_name = "model")
ft_board = board_temp(allow_pickle_read = True)
vetiver.vetiver_pin_write(ft_board, vetiver_fasttext_model)

Again, the error code I get for this snippet ^ of code is cannot pickle 'fasttext_pybind.fasttext' object同样,我得到的这段代码 ^ 的错误代码是cannot pickle 'fasttext_pybind.fasttext' object

I appreciate any help or any tips,我感谢任何帮助或任何提示,

Thank you kindly!非常感谢你!

Jamal贾马尔

The official Facebook fasttext module relies on Facebook's non-Python implementation, and storage format – so that's likely the pickle-resistant barrier you're hitting.官方的fasttext快速文本模块依赖于 Facebook 的非 Python 实现和存储格式——所以这很可能是您遇到的抗泡菜障碍。

If you're not using the --supervised classification mode, the completely Python & Cython Gensim library includes a FastText model class which does everything except that mode.如果您不使用--supervised分类模式,则完整的 Python 和 Cython Gensim 库包括一个FastText model class ,它可以执行除该模式之外的所有操作。 It can also load/save Facebook-format models.它还可以加载/保存 Facebook 格式的模型。

While Gensim's own native .save() operations uses a mixture of pickling & raw numpy array files, for historic & efficiency reasons, its models should also be amenable to complete pickling (if using recent Pythons & otherwise your project is OK with the full overhead).虽然 Gensim 自己的本机.save()操作使用酸洗和原始 numpy 数组文件的混合,但出于历史和效率的原因,它的模型也应该能够完成酸洗(如果使用最近的 Python 否则你的项目可以承受全部开销)。

If you still need features from the Facebook fasttext like the supervised-mode, you might have to wrap their native objects, with unpickleable parts, with proxy objects that intercept pickle-serialization attempts and somehow leverage their custom formats to simulate pickle-ability.如果您仍然需要fasttext中的功能(如监督模式),您可能必须使用不可腌制的部分包装其原生对象,并使用拦截腌制序列化尝试的代理对象,并以某种方式利用其自定义格式来模拟腌制能力。

For example, on serialization, ask the wrapped object to write itself in its usual way, then pickle-serialize the entire raw native file as one serialized raw-data field of your wrapper object.例如,在序列化时,要求被包装的 object 以通常的方式写入自身,然后将整个原始本机文件作为包装器 object 的一个序列化原始数据字段进行腌制序列化。 On deserialization, explicitly take that giant raw file field, write it to disk, then use the wrapped class's native load.在反序列化时,显式获取巨大的原始文件字段,将其写入磁盘,然后使用包装类的本机负载。

It'd be rather slow & ugly, and involve a large amount of extra temporary addessable memory usage during marshalling between the two serialization formats - but perhaps if you have no other option, & your systems have enough tolerance for the delay/memory-usage, it would let you use native fasttext models in your desired pins / vetiver -based architecture.它会相当缓慢和丑陋,并且在两种序列化格式之间编组期间涉及大量额外的临时可添加 memory 使用 - 但也许如果您别无选择,并且您的系统对延迟/内存使用有足够的容忍度,它可以让您在所需的基于pins / vetiver的架构中使用本机fasttext模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM