简体   繁体   中英

How to deploy multiple TensorFlow models using AWS?

I've trained 10 different TensorFlow models for style transfer, basically, each model is responsible to apply filters to a image based on a style image. So, every model is functioning independently and I want to integrate this into an application. Is there any way to deploy these models using AWS?

I've tried deploying these models using AWS SageMaker and then using the endpoint with AWS Lambda and then finally creating an API using API Gateway. But the catch here is that we can only deploy a single model on SageMaker, but in my case I want to deploy 10 different models.

I expect to provide a link to each model in my application, so the selected filter will trigger the model on AWS and will apply the filter.

What I did for something similar is that I created my own docker container with an api code capable of loading and predicting with multiple models. The api, when it starts it copies a model.tar.gz from an S3 bucket, and inside that tar.gz are the weights for all my models, my code then scans the content and loads all the models. If your models are too big (RAM consumption) you might need to handle this differently, as it's said here , that it loads the model only when you call predict. I load all the models at the beginning to have faster predicts. That is not actually a big change in code.

Another approach that I'm trying right now is to have the API Gateway call multiple Sagemaker endpoints, although I did not find good documentation for that.

There are couple options, and the final choice depends on your priorities in terms of cost, latency, reliability, simplicity.

  1. Different SageMaker endpoints per model - one benefit of that is that it leads to better robustness, because models are isolated from one another. If one model gets called a lot, it won't put the whole fleet down. They each live their own life, and can also be hosted on separate type of machines, to achieve better economics. Note that to achieve high-availability it is even recommended to double hardware backend (2+ servers per SageMaker endpoint), so that endpoints are multi-zone, as SageMaker does its best to host endpoint backend on different availability zones if an endpoint has two or more instances.
  2. One SageMaker TFServing multi-model endpoint - If all your models are TensorFlow models and if their artifacts are compatible with TFServing, you may be able to host all of them in a single SageMaker TFServing endpoint. See this section of the docs: Deploying more than one model to your endpoint
  3. One SageMaker Multi-Model Endpoint , a feature that was released end of 2019 and that enables hosting of multiple models in the same container.
  4. Serverless deployment in AWS Lambda - this can be cost-effective: models generate charges only when called. This is limited to pairs of {DL model ; DL framework} that fit within Lambda memory and storage limits and that do not require GPU. It's been documented couple times in the past, notably with Tensorflow and MXNet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM