简体   繁体   中英

gRPC - Accumulate requests from Multiple clients

Let's assume I have multiple clients sending requests to a server (gRPC service). I would like my server to be able to collect, let say 8 requests, process these requests at once, and then only send the result back to the clients. I'm not sure how to do this using GRPC functionalities, or even if it's possible or if I need something else.

context: my use case comes from serving a neural network which is on GPU. In this case, it's much more efficient to batch the input of multiple requests, do one inference, and send the result back rather than do one inference per input.

At least 3 options. Here in order of increasing complexity:

  1. Clients make calls to the server with their data. The server responds with a batch number. Clients then use the batch number to make a "Done yet?" RPC against the server. The simplest approach but uses polling and is more wasteful.

  2. Clients make calls to the server with their data. The server responds with a stream of messages updating the client on the batch's state .... working, working, working, done [results]. The advantage is the implicit 'callback' made explicit in #3 below. The disadvantage is the redundancy of the stream if you're less concerned about intermediate states.

  3. Clients make calls to the server with their data and a callback address. The server (as a gRPC client) uses the callback to make an RPC on the client (operating as a gRPC server). Most complex and likely unnecessarily so given #1 & #2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM