简体繁体中英

gRPC - Accumulate requests from Multiple clients

原文 2020-10-20 15:21:15 5 1 gpu/ grpc/ inference/ batching/ serving

Let's assume I have multiple clients sending requests to a server (gRPC service). I would like my server to be able to collect, let say 8 requests, process these requests at once, and then only send the result back to the clients. I'm not sure how to do this using GRPC functionalities, or even if it's possible or if I need something else.

context: my use case comes from serving a neural network which is on GPU. In this case, it's much more efficient to batch the input of multiple requests, do one inference, and send the result back rather than do one inference per input.

1 answers

At least 3 options. Here in order of increasing complexity:

Clients make calls to the server with their data. The server responds with a batch number. Clients then use the batch number to make a "Done yet?" RPC against the server. The simplest approach but uses polling and is more wasteful.
Clients make calls to the server with their data. The server responds with a stream of messages updating the client on the batch's state .... working, working, working, done [results]. The advantage is the implicit 'callback' made explicit in #3 below. The disadvantage is the redundancy of the stream if you're less concerned about intermediate states.
Clients make calls to the server with their data and a callback address. The server (as a gRPC client) uses the callback to make an RPC on the client (operating as a gRPC server). Most complex and likely unnecessarily so given #1 & #2.

Can you start AI platform jobs from HTTP requests?

MultiWorkerMirroredStrategy hangs after starting GRPC server

TensorFlow Choose GPU to use from multiple GPUs

Concurrent GPU kernel execution from multiple processes

Selecting from multiple SLURM GPU resources

Creating GIF animation from multiple JPG images using GPU

How to manage same CUDA kernel call from multiple CPU threads?

Access in-memory data from multiple Jupiter notebooks

OpenCL multiple GPU integral - segfault when changing global size from 32 to 64

NaN values in tensorflow when running from GPU and using multiple threads to feed queues

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Can you start AI platform jobs from HTTP requests? MultiWorkerMirroredStrategy hangs after starting GRPC server TensorFlow Choose GPU to use from multiple GPUs Concurrent GPU kernel execution from multiple processes Selecting from multiple SLURM GPU resources Creating GIF animation from multiple JPG images using GPU How to manage same CUDA kernel call from multiple CPU threads? Access in-memory data from multiple Jupiter notebooks OpenCL multiple GPU integral - segfault when changing global size from 32 to 64 NaN values in tensorflow when running from GPU and using multiple threads to feed queues

Related Tags

gRPC - Accumulate requests from Multiple clients

Question

1 answers

solution1 1 2020-10-20 16:36:33

solution1
1 2020-10-20 16:36:33