简体   繁体   中英

Keras Model Serving- GPU vs CPU

I would like to put my Keras NER model up to my website which is Django based.

My question is that when served, should the model be run on CPU or GPU? How would a GPU handle let's say hundreds of users sending requests at the same time since it cant do multi threading like CPUs can.

Thanks for your time.

"hundreds of users" will necessitate using a GPU.

knowing what your users are doing might give more insight, but I would consider having a dedicated instance for your model. Design a small program that is persistent and waits for input data using a queue strategy.

Let's say you have "100s" of users uploading "text documents" for your ner model. Your web application would gather/validate the text upload/post and make a socket connection to your model server, pass the text, receive the response, parse the response and respond to the user accordingly.

Your model server has a socket listener that upon connection, passes the socket+data request to a queue. Your model sits and waits...looking at that queue--takes any item in the queue, processes and passes the results back via the included socket(serialized handle). It's a bit messy in python but works fine. The reasons to have a persistent model running: 1) for "100s" of users, spinning up a model app each time takes several seconds. having it already initialized and ready to go will help user experience. 2) using one gpu server you don't want competition for gpu resources.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM