I would like to dockerize python program with this Dockerfile
:
FROM python:3.7-alpine
COPY requirements.pip ./requirements.pip
RUN python3 -m pip install --upgrade pip
RUN pip install -U setuptools
RUN apk update
RUN apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
python3 -m pip install -r requirements.pip --no-cache-dir && \
apk --purge del .build-deps
ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}
COPY app .
ENTRYPOINT [ "python3", "run.py" ]
and this is my requirements.pip
file:
pysher~=0.5.0
redis~=2.10.6
flake8~=3.5.0
pandas==0.23.4
Because of pandas, the docker image has 461MB, without pandas 131MB.
I was thinking how to make it smaller, so I build binary file from my applicaiton using:
pyinstaller run.py --onefile
It build 38M binary file. When I run it, it works fine. So I build docker image from Dockerfile
:
FROM alpine:3.4
ARG APP_DIR=/app
RUN mkdir -p ${APP_DIR}
WORKDIR ${APP_DIR}
COPY app/dist/run run
ENTRYPOINT [ "/bin/sh", "/app/run" ]
Basicaly, just copied my run
binary file into /app
directory. It looks fine, image has just 48.8MB. When I run the container, I receive error:
$ docker run --rm --name myapp myminimalimage:latest
/app/run: line 1: syntax error: unexpected "("
Then I was thinking, maybe there is problem with sh
, so I installed bash
, so I added 3 lines into Dockerfile
:
RUN apk update
RUN apk upgrade
RUN apk add bash
Image was built, but when I run it there is error again:
$ $ docker run --rm --name myapp myminimalimage:latest
/app/run: /app/run: cannot execute binary file
My questions:
Why is the image in the first step so big? Can I minimize the size somehow? Like choose what to install from pandas package?
Why is my binary file working fine on my system (Kubuntu 18.10) but I cant run it from alpine:3.4
, should I use another image or install something to run it?
What is the best way to build minimalistic image with my app? One of mentioned above or is there other ways?
On sizes, make sure you always pass --no-cache-dir
when using pip
(you use it once, but not in other cases). Similarly, combine uses of apk
and make sure the last step is to clear the apk
cache so it never gets frozen in an image layer, eg replace your three separate RUN
s with RUN apk update && apk upgrade && apk add bash && rm -rf /var/cache/apk/*
; achieves the same effect in a single layer, that doesn't keep the apk
cache around.
Example:
FROM python:3.7-alpine
COPY requirements.pip ./requirements.pip
# Avoid pip cache, use consistent command line with other uses, and merge simple layers
RUN python3 -m pip install --upgrade --no-cache-dir pip && \
python3 -m pip install --upgrade --no-cache-dir setuptools
# Combine update and add into same layer, clear cache explicitly at end
RUN apk update && apk add --no-cache --virtual .build-deps gcc python3-dev musl-dev openssl-dev libffi-dev g++ && \
python3 -m pip install -r requirements.pip --no-cache-dir && \
apk --purge del .build-deps && rm -rf /var/cache/apk/*
Don't expect it to do much (you already used --no-cache-dir
on the big pip
operation), but it's something. pandas
is a huge monolithic package, dependent on other huge monolithic packages; there is a limit to what you can accomplish here.
Keep in mind that if you don't use Alpine, you won't need a compiler, since you can just use wheels. This makes everything simpler... eg you don't need to install and then uninstall compilers. Slightly bigger, but only slightly.
(See here for more about why I'm not a fan of Alpine Linux: https://pythonspeed.com/articles/base-image-python-docker-images/ )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.