简体繁体 English

使用Flask通过REST API服务训练过的Tensorflow模型？

[英]Serve trained Tensorflow model with REST API using Flask?

原文 2016-04-08 06:50:12 5 3 python/ rest/ machine-learning/ tensorflow/ tensorflow-serving

I have got a trained Tensorflow model and I want to serve the prediction method with REST API. 我有一个训练有素的Tensorflow模型，我想通过REST API提供预测方法。 What I can think of is to use Flask to build a simple REST API that receive JSON as input and then call the predict method in Tensorflow and then return the predicted result to the client side. 我能想到的是使用Flask构建一个简单的REST API，该API接收JSON作为输入，然后在Tensorflow中调用预报方法，然后将预报结果返回给客户端。

I would like to know is there any concern to do it this way especially in production environment? 我想知道是否有这样做的担忧，尤其是在生产环境中？

Many thanks! 非常感谢！

3 个解决方案

The first concern which comes into my mind is the performance. 我首先想到的是性能。

TensorFlow team seems to have worked out server/client usage. TensorFlow团队似乎已经确定了服务器/客户端的使用情况。 You may want to look into tensorflow serving . 您可能需要研究tensorflow服务。 As a default, it uses gRPC for communication protocol. 默认情况下，它将gRPC用于通信协议。

We use Flask + TensorFlow serving at work. 我们在工作中使用Flask + TensorFlow服务。 Our setup might not be the most optimal way to serve models, but it gets the job done and it works fine for us so far. 我们的设置可能不是服务模型的最佳方法，但是它可以完成工作，到目前为止对我们来说还算不错。

The setup is the following: 设置如下：

Because tfserving takes forever to build, we built a docker image (not GPU support or anything, but it works for just serving a model and it's faster and better than serving it directly from within a huge Python/Flask monolite). 因为tfserving会花费很多时间来构建，所以我们构建了一个docker映像（不支持GPU或其他任何东西，但它仅用于服务模型，它比直接在大型Python / Flask单一实体中提供服务更快，更好）。 The model server image can be found here: https://hub.docker.com/r/epigramai/model-server/ 可以在以下位置找到模型服务器映像： https : //hub.docker.com/r/epigramai/model-server/
Then Flask is used to setup an API. 然后使用Flask设置API。 In order to send requests to the model server we need a grcp prediction client, so we built one in Python that we can import directly into the flask API, https://github.com/epigramai/tfserving_predict_client/ . 为了将请求发送到模型服务器，我们需要一个grcp预测客户端，因此我们在Python中构建了一个客户端，可以将其直接导入Flask API https://github.com/epigramai/tfserving_predict_client/中。

The good thing here is that the model is not served by the Flask API application. 这样做的好处是Flask API应用程序不提供该模型。 The docker image model server can easily be replaced with a model server running on a GPU compiled for the machines hardware instead of the docker container. 可以用在为机器硬件编译的GPU上运行的模型服务器（而不是docker容器）轻松替换docker图像模型服务器。

I think that one of your main concerns might be batching the requests. 我认为您主要关注的问题之一可能是批量处理请求。 For example, let's say that your model is a trained CNN like VGG, Inception or similar. 例如，假设您的模型是经过训练的CNN，例如VGG，Inception或类似的。 If you implement a regular web service with Flask, for each prediction request you receive (assuming you're running on GPU) you will do the prediction of a single image in the GPU, which can be suboptimal since you could batch similar requests, for example. 如果您使用Flask实现常规的Web服务，则对于收到的每个预测请求（假设您正在GPU上运行），您将对GPU中的单个图像进行预测，由于您可以批量处理类似的请求，因此该图像可能不是最优的。例。

That's one of the things that TensorFlow Serving aims to offer, being able to combine requests for the same model/signature into a single batch before sending to GPU, being more efficient in the use of resources and (potentially) in throughput. 这是TensorFlow Serving旨在提供的功能之一，能够在发送给GPU之前将对相同模型/签名的请求合并为一个批处理，从而更有效地利用资源和（可能）提高吞吐量。 You can find more information here: https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching 您可以在这里找到更多信息： https : //github.com/tensorflow/serving/tree/master/tensorflow_serving/batching

That said, it depends on the scenario very much. 就是说，这很大程度上取决于场景。 But batching of the predictions is something important to keep in mind. 但是，要记住这些预测的批次很重要。