简体   繁体   中英

Heroku: deploying Deep Learning model

I have developed a rest API using Flask to expose a Python Keras Deep Learning model (CNN for text classification). I have a very simple script that loads the model into memory and outputs class probabilities for a given text input. The API works perfectly locally.

However, when I git push heroku master , I get Compiled slug size: 588.2M is too large (max is 500M) . The model is 83MB in size, which is quite small for a Deep Learning model. Notable dependencies include Keras and its tensorflow backend.

I know that you can use GBs of RAM and disk space on Heroku. But the bottleneck seems to be the slug size. Is there a way to circumvent this? Or is Heroku just not the right tool for deploying Deep Learning models?

The first thing I would check, as suggested by others, is to find out why your repo is so big given that the model size is only 83MB.

Given that you cannot reduce the size there is the option of offloading parts of the repo, but to do this you will still need an idea of which files are taking up the space. Offloading is suggested in the heroku docs. Slug size is limited to 500MB as stated here: https://devcenter.heroku.com/articles/slug-compiler#slug-size and I believe this has to do with the time it takes to spin up a new instance if a change in resources is needed. However, you can use offloading if you have particularly large files. More info on offloading here: https://devcenter.heroku.com/articles/s3

This answer assumes that your model is only 83MB and the total size of your repository directory is smaller (likely much smaller) than 500MB.

There could be a few issues, but the obvious thing you need to do is reduce your git repository to less than 500MB.

First, try commands like the following to reduce the size of your repo (see this blog post for reference):

heroku plugins:install heroku-repo
heroku repo:gc --app your-app-name
heroku repo:purge_cache --app your-app-name

These might solve your issue.

Another potential issue is that you have at some point committed another (large size) model and removed it from your repo in a subsequent commit. The git repo now includes a version of that model in your .git folder and git history. There are a few fixes for this, but if you don't need your commit history you can copy the repo to another folder and create a fresh git repo with git init . Commit everything with something like "Initial commit" and then try pushing this repo with only one commit to Heroku. Likely that will be a much smaller repo size.

I would say that Heroku is not the right tool for deploying the deep learning model itself. For that, you could consider using a Platform as a Service dedicated to Deep Learning, such as Floydhub . You could deploy your Flask REST API on Floydhub too.

A lot of these answers are great for reducing slug size but if anyone still has problems with deploying a deep learning model to heroku it is important to note that for whatever reason tensorflow 2.0 is ~500MB whereas earlier versions are much smaller. Using an earlier version of tensorflow can greatly reduce your slug size.

You can reduce the model size and use tensorflow-cpu which has a smaller size (144MB with Python 3.8)

pip install tensorflow-cpu

https://pypi.org/project/tensorflow-cpu/#files

As a resource, you can visit the Heroku Slug Compiler help page.

Having an 83MB model size doesn't mean that it is 83MB all the way. Since packages are being compiled when being pushed to Heroku, this will obviously eat up more slug space so that the packages can be ready for use by the application. The best solution is probably to put large assets to a container like AWS S3 or any other good counterpart. Or worst is to use a different cloud service.

Heroku is a very good cloud platform to deploy your apps but if you have a Deep Learning model ie an app that needs to predict using large CNN / Deep Learning models then this cloud is not suitable. You can try other cloud platforms like AWS, Amazon Sagemaker, MS Azure, IBM Watson.

I was facing the same issue and after spending several days I came to know it was tensorflow library that was causing this slug overhead.

I solved it using 1 line in the requirements.txt file:

tensorflow-cpu==2.5.0

Instead of

tensorflow==2.5.0

You can use any updated tensorflow library version. Read more about tensorflow-cpu here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM