简体繁体 English

Apache Flink 从文件中加载 ML model

[英]Apache Flink load ML model from file

原文 2020-10-07 15:24:51 9 1 apache-flink/ flink-streaming/ flinkml

I'd like to know if there is a way(or some sort of code example) to load an encoded pre-trained model (written in python) inside a Flink streaming application.我想知道是否有一种方法（或某种代码示例）可以在 Flink 流应用程序中加载经过编码的预训练 model（用 python 编写）。 So I can fit the model using the weights loaded from the file system and the data coming from from stream.所以我可以使用从文件系统加载的权重和来自 stream 的数据来拟合 model。

Thank you in advance先感谢您

1 个解决方案

You can do this in a number of different ways.您可以通过多种不同的方式执行此操作。 Generally, the simplest way would be to simply invoke the code that downloads the model from some external storage like s3 for example in the open method of your function. Then You can use the library of Your choice to load the pre-trained weights and process the data.通常，最简单的方法是简单地调用从一些外部存储（例如 s3）下载 model 的代码，例如在您的 function 的open方法中。然后您可以使用您选择的库来加载预训练的权重和过程数据。 You can look for some inspiration here , this is the code for loading model serialized with protobuf read from Kafka, but You can use it to understand the principles.你可以在这里寻找一些灵感，这是从Kafka读取的用protobuf序列化加载model的代码，但是你可以通过它来理解原理。

Normally I wouldn't recommend reading the model from the file system as it's much less flexible and troublesome to maintain.通常我不建议从文件系统读取 model，因为它不太灵活且维护起来麻烦。 But that can be possible too, depending on Your infrastructure setup.但这也是可能的，具体取决于您的基础架构设置。 The only thing, in that case, would be to assert that the file with the model is available on the machine that Pipeline will run on.在这种情况下，唯一的事情就是断言带有 model 的文件在 Pipeline 将运行的机器上可用。