简体   繁体   English

如何序列化 Tensorflow 服务请求以减少推理/预测延迟?

[英]How to serialize Tensorflow Serving request to reduce inference/predict latency?

I have exported a TF SavedModel from an Estimator using TensorServingInputReceiver as follows:我使用 TensorServingInputReceiver 从 Estimator 导出了一个 TF SavedModel,如下所示:

def serving_input_fn():
    input_ph = tf.placeholder(tf.float32, shape=[None, 3, 224, 224], name = 'image_batches')
    input_tensors = input_ph
    return tf.estimator.export.TensorServingInputReceiver(input_tensors, input_ph)

and export the SavedModel as follows:并按如下方式导出 SavedModel:

warm_start = tf.estimator.WarmStartSettings(CKPT_DIR)
    classifier = tf.estimator.Estimator(model_fn = model_fn, warm_start_from = warm_start)
    classifier.export_savedmodel(export_dir_base = SAVED_MODEL_DIR, serving_input_receiver_fn = serving_input_fn)

However, when I use this SavedModel to perform predictions in Tensorflow Serving:但是,当我使用这个 SavedModel 在 Tensorflow Serving 中执行预测时:

json_dict = {'signature_name': 'serving_default', 'instances': data}

where data is a numpy array, I obtain speed of only about 1/5 to 1/6 of local direct inference using the SavedModel.其中数据是一个 numpy 数组,我使用 SavedModel 获得的速度仅为本地直接推理的大约 1/5 到 1/6。

Currently I'm thinking the problem may be the serialization part of the request in JSON as suggested here .目前我认为问题可能是 JSON 中请求的序列化部分,如这里所建议的。 So does anyone know how to perform serialization of the request before sending, or have any suggestions as to why the inference speed using TF Serving is much slower than direct inference?那么有没有人知道如何在发送之前对请求进行序列化,或者对为什么使用 TF Serving 的推理速度比直接推理慢很多有什么建议?

I have been facing the same situation, jason.dumps causes 90% of delay, if the image size is reduced to say (1,100,100,3) the inference speed increases by 3 fold, so basically json.dumps file is too large and thus consuming more time to write and read from memory. I tried ultrajson ie ujson but there wasn't any observable improvement.我一直面临同样的情况,jason.dumps 导致 90% 的延迟,如果图像大小减小到 (1,100,100,3) 推理速度增加 3 倍,所以基本上 json.dumps 文件太大,因此消耗更多时间从 memory 写入和读取。我尝试了 ultrajson 即 ujson,但没有任何明显的改进。

trs = img1.tolist()
data = json.dumps({"signature_name": "serving_default", "instances": trs})

As my model architecture requires INT I don't see any other option to serialize the data.由于我的 model 架构需要 INT,所以我看不到任何其他序列化数据的选项。

Modifying the input architecture can shift our input requirement of UINT to STRING which can be done by修改输入架构可以将我们对 UINT 的输入要求转移到 STRING,这可以通过以下方式完成

dl_request = requests.get(IMAGE, stream=True)
jpeg_bytes = base64.b64encode(dl_request.content).decode('utf-8')
predict_request = '{"instances" : [{"b64": "%s"}]}' % jpeg_bytes
response = requests.post(SERVER_URL, data=predict_request)

This fetches good results, but how to convert model input type from INT to STRING?这取得了很好的结果,但是如何将 model 输入类型从 INT 转换为 STRING?

Any suggestions??有什么建议么??

I have been facing the same situation, jason.dumps causes 90% of delay, if the image size is reduced to say (1,100,100,3) the inference speed increases by 3 fold, so basically json.dumps file is too large and thus consuming more time to write and read from memory. I tried ultrajson ie ujson but there wasn't any observable improvement.我一直面临同样的情况,jason.dumps 导致 90% 的延迟,如果图像大小减小到 (1,100,100,3) 推理速度增加 3 倍,所以基本上 json.dumps 文件太大,因此消耗更多时间从 memory 写入和读取。我尝试了 ultrajson 即 ujson,但没有任何明显的改进。

trs = img1.tolist()
data = json.dumps({"signature_name": "serving_default", "instances": trs})

As my model architecture requires INT I don't see any other option to serialize the data.由于我的 model 架构需要 INT,所以我看不到任何其他序列化数据的选项。

Any suggestions??有什么建议么??

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 tf.example 发送到 TensorFlow Serving gRPC 预测请求中 - How to send a tf.example into a TensorFlow Serving gRPC predict request 如何将推理与在多个GPU上使用的张量流并行化? - How to parallelize inference with tensorflow serving on multiple GPUs? Tensorflow 服务错误 "{ "error": "Malformed request: POST /v1/models/cloths:predict - Tensorflow serving error "{ "error": "Malformed request: POST /v1/models/cloths:predict 如何在 python 中对 tensor2tensor 模型进行推理(没有解码二进制文件和 TensorFlow Serving) - How to do inference on a tensor2tensor model in python (without the decoding binary and TensorFlow Serving) 对 TensorFlow 服务的 RaggedTensor 请求失败 - RaggedTensor request to TensorFlow serving fails 向TensorFlow服务的预测API的请求返回错误“缺少输入” - Requests to TensorFlow serving's predict API returns error “Missing inputs” python中的RESTful API请求[tensorflow服务] - RESTful API request in python [ tensorflow serving ] Tensorflow 服务返回 400 Bad Request 错误 - Tensorflow Serving returns 400 Bad Request error 如何在Tensorflow服务中进行批处理? - How to do batching in Tensorflow Serving? TensorFlow:如何从SavedModel进行预测? - TensorFlow: How to predict from a SavedModel?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM