简体   繁体   English

如何在 Python 中使用“model.trt”

[英]How to use "model.trt" in Python

I have a pytorch model that I exported to ONNX and converted to a tensorflow model with the following command:我有一个 pytorch model 导出到 ONNX 并转换为 tensorflow Z20F35E630DAF394DBFA4C3F68 使用以下命令:

trtexec --onnx=model.onnx --batch=400 --saveEngine=model.trt

All of this works, but how do I now load this model.trt in python and run the inference?所有这些都有效,但是我现在如何在 python 中加载这个model.trt并运行推理?

Found an answer based on this tutorial .根据本教程找到了答案。

import numpy as np
import tensorrt as trt
import pycuda.driver as cuda

dev = cuda.Device(0)
ctx = dev.make_context()

try:
    TRT_LOGGER = trt.Logger(trt.Logger.INFO)
    with open("model.trt", 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())

    with engine.create_execution_context() as context:
        # get sizes of input and output and allocate memory required for input data and for output data
        for binding in engine:
            if engine.binding_is_input(binding):  # we expect only one input
                input_shape = engine.get_binding_shape(binding)
                input_size = trt.volume(input_shape) * engine.max_batch_size * np.dtype(np.float32).itemsize  # in bytes
                device_input = cuda.mem_alloc(input_size)
            else:  # and one output
                output_shape = engine.get_binding_shape(binding)
                # create page-locked memory buffers (i.e. won't be swapped to disk)
                host_output = cuda.pagelocked_empty(trt.volume(output_shape) * engine.max_batch_size, dtype=np.float32)
                device_output = cuda.mem_alloc(host_output.nbytes)

        stream = cuda.Stream()

        host_input = np.array(batch, dtype=np.float32, order='C')
        cuda.memcpy_htod_async(device_input, host_input, stream)

        context.execute_async(bindings=[int(device_input), int(device_output)], stream_handle=stream.handle)
        cuda.memcpy_dtoh_async(host_output, device_output, stream)
        stream.synchronize()

        # postprocess results
        output_data = host_output.reshape(engine.max_batch_size, output_shape[0]).T

finally:
    ctx.pop()

The official documentation has a lot of examples .官方文档有很多例子 The basic steps to follow are:要遵循的基本步骤是:

  • ONNX parser: takes a trained model in ONNX format as input and populates a network object in TensorRT ONNX 解析器:将训练好的 ONNX 格式的 model 作为输入,并在 TensorRT 中填充网络 object
  • Builder: takes a network in TensorRT and generates an engine that is optimized for the target platform Builder:在 TensorRT 中获取网络并生成针对目标平台优化的引擎
  • Engine: takes input data, performs inferences and emits inference output引擎:获取输入数据,执行推理并发出推理 output
  • Logger: object associated with the builder and engine to capture errors, warnings and other information during the build and inference phases记录器:object 与构建器和引擎关联,用于在构建和推理阶段捕获错误、警告和其他信息

An example for the engine is:引擎的一个例子是:

 import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
from onnx import ModelProto 
import onnx

import numpy as np
import matplotlib.pyplot as plt
from time import time




TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)

#batch_size = 1
explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
#inp_shape = [batch_size, 3, 1024, 1024] # the shape I was using


def build_engine(onnx_path, shape = inp_shape):
    with trt.Builder(TRT_LOGGER) as builder,builder.create_builder_config() as config,\
    builder.create_network(explicit_batch) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        if builder.platform_has_fast_fp16:
            builder.fp16_mode = True
        builder.max_workspace_size = (1 << 30)
        #builder.max_workspace_size = (3072 << 20)
        #profile = builder.create_optimization_profile()
        #config.max_workspace_size = (3072 << 20)
        #config.add_optimization_profile(profile)
        print("parsing")
        with open(onnx_path, 'rb') as model:
            print("onnx found")
            if not parser.parse(model.read()):
                print("parse failed")
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
            #parser.parse(model.read())
        last_layer = network.get_layer(network.num_layers - 1)
        # Check if last layer recognizes it's output
        if not last_layer.get_output(0):
            # If not, then mark the output using TensorRT API
            network.mark_output(last_layer.get_output(0))
        network.get_input(0).shape = shape

        engine = builder.build_cuda_engine(network)
        return engine


def save_engine(engine, file_name):
    buf = engine.serialize()
    with open(file_name, 'wb') as f:
        f.write(buf)


def load_engine(trt_runtime, plan_path):
    with open(engine_path, 'rb') as f:
        engine_data = f.read()
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine


if __name__ == "__main__":
    onnx_path = "./path/to/your/model.onnx"
    engine_name = "./path/to/engine.plan"
    
    model = ModelProto()
    with open(onnx_path, "rb") as f:
        model.ParseFromString(f.read())

    d0 = model.graph.input[0].type.tensor_type.shape.dim[1].dim_value
    d1 = model.graph.input[0].type.tensor_type.shape.dim[2].dim_value
    d2 = model.graph.input[0].type.tensor_type.shape.dim[3].dim_value
    shape = [batch_size , d0, d1 ,d2]
    print(shape)

    print("trying to build engine")
    engine = build_engine(onnx_path,shape)
    save_engine(engine,engine_name)


    print("finished")

Follow this page for another example and information.按照此 页面获取另一个示例和信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 trt.TrtGraphConverterV2(或其他建议)将 tensorflow 模型转换为 TensorRT 优化模型? - How do I convert a tensorflow model into a TensorRT optimized model using trt.TrtGraphConverterV2 (or other suggestion)? 如何在python之外使用Vowpal Wabbit模型 - How to use a Vowpal Wabbit model outside of python 如何在 python 中使用经过训练的 Tensorflow Lite model? - How to use a trained Tensorflow Lite model in python? 如何在 keras 和 python 中保存和使用经过训练的模型 - How to save and use a trained model in keras and python Python Tensorflow 如何使用数据库中的模型? - Python Tensorflow how to use model from database? 如何在使用 Pydantic 定义 model 中使用 python 关键字? - How to use python keywords in model defing with Pydantic? 如何在单独的 python 脚本中使用经过 python 训练的 model - How to use a python trained model in your seperate python script 如何在Python Django项目中为多个应用程序使用模型? - How to use a model in a Python Django project for multiple applications? 如何在Python中构建朴素贝叶斯模型中使用时间戳数据 - How to use Timestamp Data in Building Naive Bayes model in Python 如何在另一个Python脚本中使用Django项目中的模型 - How to use a model from a Django project in another Python script
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM