如何加速 Tensorflow 2 keras model 进行推理？

Question

So there's a big update nowadays, moving from TensorFlow 1.X to 2.X.所以现在有一个大的更新，从 TensorFlow 1.X 到 2.X。

In TF 1.XI got use to a pipeline which helped me to push my keras model to production.在 TF 1.XI 中，我习惯了一个管道，它帮助我将 keras model 推向生产。 The pipeline: keras (h5) model --> freeze & convert to pb --> optimize pb This workflow helped me to speed up the inference and my final model could be stored a single (pb) file, not a folder ( see SavedModel format ).管道： keras (h5) model --> freeze & convert to pb --> optimize pb这个工作流帮助我加快了推理速度，我的最终 modelp 文件夹可以存储（参见 SavedModel存储的单个文件，而不是一个文件夹）格式）。

How can I optimize my model for inference in TensorFlow 2.0.0?如何优化我的 model 以在 TensorFlow 2.0.0 中进行推理？

My first impression was that I need to convert my tf.keras model to tflite, but since my GPU uses float32 operations, this conversion would make my life even harder.我的第一印象是我需要将我的 tf.keras model 转换为 tflite，但是由于我的 GPU 使用了 float32 操作，所以这种转换将使我的生活更加困难。

Thanks.谢谢。

Answer 1

One way to go about it is to optimize your model using Tensorflow with TensorRT (TF-TRT) ( https://github.com/tensorflow/tensorrt ). One way to go about it is to optimize your model using Tensorflow with TensorRT (TF-TRT) ( https://github.com/tensorflow/tensorrt ). However, in Tensorflow 2, models are saved in a folder instead of a single.pb file.但是，在 Tensorflow 2 中，模型保存在文件夹中，而不是单个 .pb 文件中。 This is also the case for TF-TRT optimized models, they are stored in a folder. TF-TRT 优化模型也是如此，它们存储在一个文件夹中。 You can convert your model to TF-TRT as:您可以将 model 转换为 TF-TRT，如下所示：

from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir=saved_model_dir)
converter.convert() 
converter.save("trt_optimized_model") # save it to a dir

If you have a requirement that the model needs to be contained in a single file (and do not care about the optimization offered by TF-TRT) you can convert the SavedModel to ONNX.如果您要求 model 需要包含在单个文件中（并且不关心 TF-TRT 提供的优化），您可以将 SavedModel 转换为 ONNX。 And use ONNX runtime for inference.并使用 ONNX 运行时进行推理。 You can even go one step further here and convert the ONNX file into TensorRT ( https://developer.nvidia.com/Tensorrt ).您甚至可以在此处进一步将 go 将 ONNX 文件转换为 TensorRT ( https://developer.nvidia.com/Tensorrt )。 This will give you a single optimized file that you can run using TensorRT (note that you cannot run the resulting file with Tensorflow anymore).这将为您提供一个可以使用 TensorRT 运行的优化文件（请注意，您不能再使用 Tensorflow 运行生成的文件）。

Answer 2

The other way to go about it is to use a different toolkit for the inference eg OpenVINO . go 关于它的另一种方法是使用不同的工具包进行推理，例如OpenVINO 。 OpenVINO is optimized specifically for Intel hardware but it should work with any CPU. OpenVINO 专门针对 Intel 硬件进行了优化，但它应该适用于任何 CPU。 It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy.它通过转换为中间表示 (IR)、执行图形修剪并将某些操作融合到其他操作中来优化您的 model，同时保持准确性。 Then it uses vectorization in runtime.然后它在运行时使用矢量化。

It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers.除非您有精美的自定义层，否则将 Tensorflow model 转换为 OpenVINO 相当简单。 The full tutorial on how to do it can be found here .可以在此处找到有关如何执行此操作的完整教程。 Some snippets below.下面的一些片段。

Install OpenVINO安装 OpenVINO

The easiest way to do it is using PIP.最简单的方法是使用 PIP。 Alternatively, you can use this tool to find the best way in your case.或者，您可以使用此工具找到适合您情况的最佳方法。

pip install openvino-dev[tensorflow2]

Save your model as SavedModel将您的 model 保存为 SavedModel

OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first. OpenVINO 无法转换 HDF5 model，因此您必须先将其另存为 SavedModel。

import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')

Use Model Optimizer to convert SavedModel model使用 Model 优化器转换 SavedModel model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. Model Optimizer 是来自 OpenVINO Development Package 的命令行工具。 It converts the Tensorflow model to IR, which is a default format for OpenVINO.它将 Tensorflow model 转换为 IR，这是 OpenVINO 的默认格式。 You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type).您还可以尝试 FP16 的精度，它应该可以为您提供更好的性能而不会显着降低精度（只需更改 data_type）。 Run in the command line:在命令行中运行：

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

Run the inference运行推理

The converted model can be loaded by the runtime and compiled for a specific device eg CPU or GPU (integrated into your CPU like Intel HD Graphics).转换后的 model 可以由运行时加载并针对特定设备进行编译，例如 CPU 或 GPU（集成到您的 CPU 中，如 Intel HD Graphics）。 If you don't know what is the best choice for you, just use AUTO.如果您不知道什么是最适合您的选择，请使用 AUTO。

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.免责声明：我在 OpenVINO 上工作。

如何加速 Tensorflow 2 keras model 进行推理？

问题描述

2 个解决方案

解决方案1
0 2021-01-11 14:04:01

解决方案2
0 2022-04-27 14:56:59

如何加速 Tensorflow 2 keras model 进行推理？

问题描述

2 个解决方案

解决方案1 0 2021-01-11 14:04:01

解决方案2 0 2022-04-27 14:56:59

解决方案1
0 2021-01-11 14:04:01

解决方案2
0 2022-04-27 14:56:59