简体   繁体   English

TorchServe:如何将字节 output 转换为张量

[英]TorchServe: How to convert bytes output to tensors

I have a model that is served using TorchServe.我有一个使用 TorchServe 服务的 model。 I'm communicating with the TorchServe server using gRPC.我正在使用 gRPC 与 TorchServe 服务器通信。 The final postprocess method of the custom handler defined returns a list which is converted into bytes for transfer over the network.定义的自定义处理程序的最终postprocess方法返回一个列表,该列表转换为字节以通过网络传输。

The post process method后处理方法

def postprocess(self, data):
    # data type - torch.Tensor
    # data shape - [1, 17, 80, 64] and data dtype - torch.float32
    return data.tolist()

The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval主要问题是在客户端,通过ast.literal_eval将接收到的字节从 TorchServe 转换为 Torch 张量效率低下

# This takes 0.3 seconds
response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
            response.prediction.decode('utf-8')))

Using numpy.frombuffer or torch.frombuffer return the following error.使用numpy.frombuffertorch.frombuffer返回以下错误。

import numpy as np

np.frombuffer(response.prediction)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size

Using torch使用手电筒

import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)

Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor ?是否有将接收到的字节转换为torch.Tensor的替代、更有效的解决方案?

One hack I've found that has significantly increased the performance while sending large tensors is to return a list of json.我发现在发送大张量时显着提高性能的一个技巧是返回 json 列表。

In your handler's postprocess function:在您的处理程序的后处理 function 中:

def postprocess(self, data):
    output_data = {}
    output_data['data'] = data.tolist()
    return [output_data]

At the clients side when you receive the grpc response decode it using json.loads在客户端,当您收到 grpc 响应时,使用 json.loads 对其进行解码

response = self.inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))

preds should have the output tensor preds应该有 output 张量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM