[英]Machine Learning Tools Docker Image Size Issue
I need a docker container with the following packages installed on it for some sort of computational analysis.我需要一个 docker 容器,上面安装了以下软件包以进行某种计算分析。 The packages listed below are inside the requirements.txt file.
下面列出的包在 requirements.txt 文件中。
boto3 = "*"
nltk ="*"
pandas = "*"
scikit-learn = "*"
sentence_transformers = "*"
spacy = {extras = ["lookups"],version = "*"}
streamlit = "*"
tensorflow = "*"
unidecode = "*"
I have write a Dockerfile for this thing, The issue here I am facing is the size of the Docker Image which is around 6 GB (6.42 exactly).我已经为这个东西写了一个 Dockerfile,我面临的问题是 Docker 图像的大小约为 6 GB(确切地说是 6.42)。 Can anybody help me with this issue, How I can reduce the size of the Docker Image.
谁能帮我解决这个问题,如何减小 Docker 图像的大小。
Here is the DockerFile这是 DockerFile
FROM python:3.7-slim-buster as base
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
COPY . /opt/program
WORKDIR /opt/program/
RUN chmod +x train
# Install dependencies
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get autoremove -y \
&& apt-get install -y \
gcc \
build-essential \
zlib1g-dev \
wget \
unzip \
cmake \
python3-dev \
gfortran \
libblas-dev \
liblapack-dev \
libatlas-base-dev \
&& apt-get clean
# Install Python packages
RUN pip install --upgrade pip \
&& pip install \
ipython[all] \
nose \
matplotlib \
pandas \
scipy \
sympy \
&& rm -fr /root/.cache
RUN pip install --install-option="--prefix=/install" -r requirements.txt
You are installing a lot of stuff into that image therefore it will get kind of big anyway but there might be some stuff that you can do about it.您在该映像中安装了很多东西,因此无论如何它都会变得很大,但是您可能可以做一些事情。
The minor one - remove /var/lib/apt/lists/*
after you are done installing the stuff via apt.次要的 - 在通过 apt 安装完这些东西后删除
/var/lib/apt/lists/*
。
RUN rm -rf /var/lib/apt/lists/*
The major one - from the contents of Dockerfile, I guess that it is used to train a model which requires training data and this can take a lot of space since you are copying everything into the image.主要的 - 从 Dockerfile 的内容来看,我猜它用于训练需要训练数据的 model,这可能会占用大量空间,因为您要将所有内容复制到图像中。 These data don't need to be present in the image, rather they need to be loaded into the container built from the image.
这些数据不需要存在于镜像中,而是需要加载到从镜像构建的容器中。
Instead of copying everything into the image, copy files that are only necessary to run the logic but load the data in some other way.与其将所有内容复制到映像中,不如复制仅在运行逻辑时需要但以其他方式加载数据的文件。 One such way would be to bind mount the data into the image.
一种这样的方法是将数据绑定到图像中。 You could store the data in a separate folder, let's say
./data
and include this folder in your .dockerignore
file (so that it is not copied over).您可以将数据存储在一个单独的文件夹中,比如说
./data
并将这个文件夹包含在您的.dockerignore
文件中(这样它就不会被复制过来)。 Then, depending on how you are launching the container, you can specify the bind mount such as然后,根据您启动容器的方式,您可以指定绑定挂载,例如
docker container run -v ./data:/<path-inside-image> ...
Replace <path-inside-image>
with path where the data should be located but be careful not to mount to directory that already holds some essential files since those will be obscured by the mounted folder.将
<path-inside-image>
替换为数据所在的路径,但注意不要挂载到已经包含一些重要文件的目录,因为这些文件会被挂载的文件夹遮住。
If using bind mount is not a viable solution for you then you will need to figure out a better way to load the data into the container, for example, pulling them from the internet or from some other network attached storage once the container is running.如果使用绑定挂载对您来说不是一个可行的解决方案,那么您将需要找到一种更好的方法将数据加载到容器中,例如,一旦容器运行,就从互联网或其他网络连接存储中提取它们。
do rm -rf /var/lib/apt/lists/*
after you run apt-install,such as运行 apt-install 后执行
rm -rf /var/lib/apt/lists/*
,例如
RUN apt-get update && apt-get install -y \
ca-certificates \
netbase \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
ca-certificates \
netbase
RUN rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
netbase \
&& rm -rf /var/lib/apt/lists/*
no-install-recommends means: do not install non-essential dependency packages. no-install-recommends 表示:不要安装非必要的依赖包。
egg:蛋:
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
&& pip install cython && apt-get remove -y gcc g++ \
&& rm -rf /var/lib/apt/lists/*
Some software,like gcc,only use when install some software,we can remove it after install finish.有些软件,如gcc,只在安装某些软件时使用,安装完成后我们可以将其删除。
egg:蛋:
RUN pip install --no-cache-dir -r requirements.txt
I am not sure it.From other's Dockerfile, they download file and finally delete it after use in one RUN
,not copy file in it.我不确定。从其他的Dockerfile,他们下载文件,最后在一次
RUN
中使用后将其删除,而不是在其中复制文件。
If you use tensorflow or other AI application,you may have some model data(size is a few G),better way is download it when run in container or by ftp,object storage,or others way —— not in image,just mount or download. If you use tensorflow or other AI application,you may have some model data(size is a few G),better way is download it when run in container or by ftp,object storage,or others way —— not in image,just mount或下载。
Just in my experience.就我的经验而言。 If you use git to contorl codes.
如果您使用 git 来控制代码。 The
.git
folder may very very big. .git
文件夹可能很大很大。 The command COPY. /XXX
命令
COPY. /XXX
COPY. /XXX
will copy .git
to image.Find a way to filter the .git
.For my use: COPY. /XXX
将.git
复制到图像。找到一种过滤.git
的方法。供我使用:
FROM apline:3.12 as MID
COPY XXX /XXX/
COPY ... /XXX/
FROM image:youneed
COPY --from=MID /XXX/ /XXX/
RUN apt-get update && xxxxx
CMD ["python","app.py"]
or use .dockerignore
.或使用
.dockerignore
。
# Did wget,cmake and some on is necessary?
COPY . /opt/program
WORKDIR /opt/program/
# Install dependencies
RUN chmod +x train && apt-get update \
&& apt-get upgrade -y \
&& apt-get autoremove -y \
&& apt-get install -y \
gcc \
build-essential \
zlib1g-dev \
wget \
unzip \
cmake \
python3-dev \
gfortran \
libblas-dev \
liblapack-dev \
libatlas-base-dev \
&& apt-get clean && pip install --upgrade pip \
&& pip install --no-cache-dir \
ipython[all] \
nose \
matplotlib \
pandas \
scipy \
sympy \
&& pip install --no-cache-dir --install-option="--prefix=/install" -r requirements.txt
&& apt-get remove -y gcc unzip cmake \ # just have a try,to find what software we can remove.
&& rm -rf /var/lib/apt/lists/*
&& rm -fr /root/.cache
Of course, by this way, you may get a just smaller size image,but docker build process, will not use docker's cache .So during you try to find what software can delete, split to two or three commands RUN
to use more docker cache.当然,通过这种方式,你可能会得到一个更小尺寸的图像,但是 docker 构建过程,不会使用 docker 的缓存。所以在你尝试找到可以删除的软件时,分成两三个命令
RUN
使用更多 docker 缓存.
Hope to help you.希望能帮到你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.