TorchServe：如何将字节 output 转换为张量

Question

I have a model that is served using TorchServe.我有一个使用 TorchServe 服务的 model。 I'm communicating with the TorchServe server using gRPC.我正在使用 gRPC 与 TorchServe 服务器通信。 The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network.定义的自定义处理程序的最终postprocess方法返回一个列表，该列表转换为字节以通过网络传输。

The post process method后处理方法

def postprocess(self, data):
    # data type - torch.Tensor
    # data shape - [1, 17, 80, 64] and data dtype - torch.float32
    return data.tolist()

The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval主要问题是在客户端，通过ast.literal_eval将接收到的字节从 TorchServe 转换为 Torch 张量效率低下

# This takes 0.3 seconds
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
            response.prediction.decode('utf-8')))

Using numpy.frombuffer or torch.frombuffer return the following error.使用numpy.frombuffer或torch.frombuffer返回以下错误。

import numpy as np

np.frombuffer(response.prediction)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

Using torch使用手电筒

import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)

Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor ?是否有将接收到的字节转换为torch.Tensor的替代、更有效的解决方案？

Answer 1

One hack I've found that has significantly increased the performance while sending large tensors is to return a list of json.我发现在发送大张量时显着提高性能的一个技巧是返回 json 列表。

In your handler's postprocess function:在您的处理程序的后处理 function 中：

def postprocess(self, data):
    output_data = {}
    output_data['data'] = data.tolist()
    return [output_data]

At the clients side when you receive the grpc response decode it using json.loads在客户端，当您收到 grpc 响应时，使用 json.loads 对其进行解码

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))

preds should have the output tensor preds应该有 output 张量

TorchServe：如何将字节 output 转换为张量

问题描述

1 个解决方案

解决方案1
0 2022-09-14 18:10:14

TorchServe：如何将字节 output 转换为张量

问题描述

1 个解决方案

解决方案1 0 2022-09-14 18:10:14

解决方案1
0 2022-09-14 18:10:14