简体   繁体   中英

How to bundle Python for AWS Lambda

I have a project I'd like to run on AWS Lambda but it is exceeding the 50MB zipped limit. Right now it is at 128MB zipped and the project folder with the virtual environment sits at 623MB and includes (top users of space):

  • scipy (~187MB)
  • pandas (~108MB)
  • numpy (~74.4MB)
  • lambda_packages (~71.4MB)

Without the virtualenv the project is <2MB. The requirements.txt is:

click==6.7
cycler==0.10.0
ecdsa==0.13
Flask==0.12.2
Flask-Cors==3.0.3
future==0.16.0
itsdangerous==0.24
Jinja2==2.10
MarkupSafe==1.0
matplotlib==2.1.2
mpmath==1.0.0
numericalunits==1.19
numpy==1.14.0
pandas==0.22.0
pycryptodome==3.4.7
pyparsing==2.2.0
python-dateutil==2.6.1
python-dotenv==0.7.1
python-jose==2.0.2
pytz==2017.3
scipy==1.0.0
six==1.11.0
sympy==1.1.1
Werkzeug==0.14.1
xlrd==1.1.0

I deploy using Zappa , so my understanding of the whole infrastructure is limited. My understanding is that some (very few) of the libraries do not get uploaded so for eg numpy, that part does not get uploaded and Amazon's version gets used that is already available in that environment.

I propose the following workflow (without using S3 buckets for slim_handler ):

  1. delete all the files that match "test_*.py" in all packages
  2. manually tree shake scipy as I only use scipy.minimize , by deleting most of it and re-running my tests
  3. minify all the code and obfuscate using pyminifier
  4. zappa deploy

Or:

  1. run compileall to get .pyc files
  2. delete all *.py files and let zappa upload .pyc files instead
  3. zappa deploy

I've had issues with slim_handler: true , either my connection drops and the upload fails or some other error occurs and at ~25% of the upload to S3 I get Could not connect to the endpoint URL . For the purposes of this question, I'd like to get the dependencies down to manageable levels.

Nevertheless, over half a gig of dependencies with the main app being less than 2MB has to be some sort of record.

My questions are:

  1. What is the unzipped limit for AWS? Is it 250MB or 500MB?
  2. Am I on the right track with the above method for reducing package sizes?
  3. Is it possible to go a step further and use .pyz files?
  4. Are there any standard utilities out there that help with the above?
  5. Is there no tree shaking library for python?
  1. The limit in AWS is for unpacked 250MB of code (as seen here https://hackernoon.com/exploring-the-aws-lambda-deployment-limits-9a8384b0bec3 )
  2. I would suggest going for second method and compile everything. I think you should also consider using serverless framework. It does not force you to create virtualenv which is very heavy.

I've seen that all your packages can be compressed up to 83MB (just the packages).

My workaround would be:

  1. use serverless framework (consider moving from flask directly to API Gateway)
  2. install your packages locally on the same folder using:

     pip install -r requirements.txt -t . 
  3. try your method of compiling to .pyc files, and remove others.

  4. Deploy:

     sis deploy 

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM