[英]TorchServe: How to convert bytes output to tensors
I have a model that is served using TorchServe.我有一个使用 TorchServe 服务的 model。 I'm communicating with the TorchServe server using gRPC.我正在使用 gRPC 与 TorchServe 服务器通信。 The final postprocess
method of the custom handler defined returns a list which is converted into bytes for transfer over the network.定义的自定义处理程序的最终postprocess
方法返回一个列表,该列表转换为字节以通过网络传输。
The post process method后处理方法
def postprocess(self, data):
# data type - torch.Tensor
# data shape - [1, 17, 80, 64] and data dtype - torch.float32
return data.tolist()
The main issue is at the client where converting the received bytes from TorchServe to a torch Tensor is inefficiently done via ast.literal_eval
主要问题是在客户端,通过ast.literal_eval
将接收到的字节从 TorchServe 转换为 Torch 张量效率低下
# This takes 0.3 seconds
response = self.inference_stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
# This takes 0.84 seconds
predictions = torch.as_tensor(literal_eval(
response.prediction.decode('utf-8')))
Using numpy.frombuffer
or torch.frombuffer
return the following error.使用numpy.frombuffer
或torch.frombuffer
返回以下错误。
import numpy as np
np.frombuffer(response.prediction)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
np.frombuffer(response.prediction, dtype=np.float32)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer size must be a multiple of element size
Using torch使用手电筒
import torch
torch.frombuffer(response.prediction, dtype = torch.float32)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: buffer length (2601542 bytes) after offset (0 bytes) must be a multiple of element size (4)
Is there an alternative, more efficient solution of converting the received bytes into torch.Tensor
?是否有将接收到的字节转换为torch.Tensor
的替代、更有效的解决方案?
One hack I've found that has significantly increased the performance while sending large tensors is to return a list of json.我发现在发送大张量时显着提高性能的一个技巧是返回 json 列表。
In your handler's postprocess function:在您的处理程序的后处理 function 中:
def postprocess(self, data):
output_data = {}
output_data['data'] = data.tolist()
return [output_data]
At the clients side when you receive the grpc response decode it using json.loads在客户端,当您收到 grpc 响应时,使用 json.loads 对其进行解码
response = self.inference_stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
decoded_output = response.prediction.decode('utf-8')
preds = torch.as_tensor(json.loads(decoded_output))
preds
should have the output tensor preds
应该有 output 张量
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.