简体   繁体   中英

What does “max_batch_size” mean in tensorflow-serving batching_config.txt?

I'm using tensorflow-serving on GPUs with --enable-batching=true .

However, I'm a little confused with max_batch_size in batching_config.txt .

My client sends a input tensor with a tensor shape [-1, 1000] in a single gRPC request, dim0 ranges from (0, 200] . I set max_batch_size = 100 and receive an error:

"gRPC call return code: 3:Task size 158 is larger than maximum batch size 100"

"gRPC call return code: 3:Task size 162 is larger than maximum batch size 100"

Looks like max_batch_size limits dim0 of a single request, but tensorflow batches multiple requests to a batch, I thought it means the sum of request numbers.

Here is a direct description from the docs .

max_batch_size: The maximum size of any batch. This parameter governs the throughput/latency tradeoff, and also avoids having batches that are so large they exceed some resource constraint (eg GPU memory to hold a batch's data).

In ML most of the time the first dimension represents a batch. So based on my understanding tensorflow serving confuses the value for the first dimension as a batch and issues errors whenever it is bigger than the allowed value. You can verify it by issuing some of the request where you manually control the first dimension to be lower than 100. I expect this to remove the error.

After that you can modify your inputs to be sent in a proper format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM