简体   繁体   中英

How minimize python3.7 application docker image

I would like to dockerize python program with this Dockerfile :

FROM python:3.7-alpine

COPY requirements.pip ./requirements.pip

RUN python3 -m pip install --upgrade pip

RUN pip install -U setuptools

RUN apk update

RUN apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
    python3 -m pip install -r requirements.pip --no-cache-dir && \
    apk --purge del .build-deps

ARG APP_DIR=/app

RUN mkdir -p ${APP_DIR} 

WORKDIR ${APP_DIR} 

COPY app . 

ENTRYPOINT [ "python3", "run.py" ]

and this is my requirements.pip file:

pysher~=0.5.0

redis~=2.10.6

flake8~=3.5.0

pandas==0.23.4

Because of pandas, the docker image has 461MB, without pandas 131MB.

I was thinking how to make it smaller, so I build binary file from my applicaiton using:

pyinstaller run.py --onefile

It build 38M binary file. When I run it, it works fine. So I build docker image from Dockerfile :

FROM alpine:3.4

ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}

COPY app/dist/run run

ENTRYPOINT [ "/bin/sh", "/app/run" ]

Basicaly, just copied my run binary file into /app directory. It looks fine, image has just 48.8MB. When I run the container, I receive error:

$ docker run --rm --name myapp myminimalimage:latest
/app/run: line 1: syntax error: unexpected "("

Then I was thinking, maybe there is problem with sh , so I installed bash , so I added 3 lines into Dockerfile :

RUN apk update

RUN apk upgrade

RUN apk add bash

Image was built, but when I run it there is error again:

$ $ docker run --rm --name myapp myminimalimage:latest
/app/run: /app/run: cannot execute binary file

My questions:

  1. Why is the image in the first step so big? Can I minimize the size somehow? Like choose what to install from pandas package?

  2. Why is my binary file working fine on my system (Kubuntu 18.10) but I cant run it from alpine:3.4 , should I use another image or install something to run it?

  3. What is the best way to build minimalistic image with my app? One of mentioned above or is there other ways?

On sizes, make sure you always pass --no-cache-dir when using pip (you use it once, but not in other cases). Similarly, combine uses of apk and make sure the last step is to clear the apk cache so it never gets frozen in an image layer, eg replace your three separate RUN s with RUN apk update && apk upgrade && apk add bash && rm -rf /var/cache/apk/* ; achieves the same effect in a single layer, that doesn't keep the apk cache around.

Example:

FROM python:3.7-alpine

COPY requirements.pip ./requirements.pip

# Avoid pip cache, use consistent command line with other uses, and merge simple layers
RUN python3 -m pip install --upgrade --no-cache-dir pip && \
    python3 -m pip install --upgrade --no-cache-dir setuptools

# Combine update and add into same layer, clear cache explicitly at end
RUN apk update && apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
    python3 -m pip install -r requirements.pip --no-cache-dir && \
    apk --purge del .build-deps && rm -rf /var/cache/apk/*

Don't expect it to do much (you already used --no-cache-dir on the big pip operation), but it's something. pandas is a huge monolithic package, dependent on other huge monolithic packages; there is a limit to what you can accomplish here.

Keep in mind that if you don't use Alpine, you won't need a compiler, since you can just use wheels. This makes everything simpler... eg you don't need to install and then uninstall compilers. Slightly bigger, but only slightly.

(See here for more about why I'm not a fan of Alpine Linux: https://pythonspeed.com/articles/base-image-python-docker-images/ )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM