简体   繁体   English

减小 docker 构建大小,在两个图像之间共享 conda 环境

[英]Decrease docker build size, share conda environment between two images

I'm trying to build webapp using AWS.我正在尝试使用 AWS 构建 webapp。 I've got a docker-compose.yml that builds two images;我有一个构建两个图像的 docker-compose.yml; a service image (running a flask server script) and a worker image (doing all the calculations sent to it from the flask server).服务映像(运行 flask 服务器脚本)和工作映像(执行从 flask 服务器发送给它的所有计算)。

services:
  worker:
    image: co2gasp/worker:latest
    build: ./worker_app
  web:
    image : co2gasp/service:latest 
    build: ./server_app

The problem I'm having is that I'm running into errors particularly with memory when building the containers and I want to keep the build as small as possible.我遇到的问题是我在构建容器时遇到错误,尤其是 memory,我希望构建尽可能小。 In short the Docker compose file is identical for both images (see below) and it builds two identical conda environments in the two separate images – when it builds the second image it's running out of memory. What I'm wondering is if there is anyway to build a single environment and share it between both images?简而言之,两个图像的 Docker 撰写文件是相同的(见下文),它在两个单独的图像中构建了两个相同的 conda 环境——当它构建第二个图像时,它用完了 memory。我想知道是否有建立一个单一的环境并在两个图像之间共享它?

FROM continuumio/miniconda3

RUN apt-get update -y
RUN apt-get install zip -y
RUN apt-get install awscli -y
WORKDIR /app
## Create the environment:
COPY environment.yml .
#Make RUN commands use the new environment:
RUN conda env create -f environment.yml

COPY ./PHREEQC /PHREEQC
COPY ./service /service
COPY ./temp_files /temp_files
COPY ./INPUT_DATA /INPUT_DATA
COPY ./PHREEQC/phreeqc_files/database/pitzer.dat /bin/pitzer.dat
COPY ./PHREEQC/phreeqc_files/bin/phreeqc /bin/phreeqc
ENV PATH=${PATH}:/bin/phreeqc
ENV PATH=${PATH}:/bin/pitzer.dat
ENV PATH=${PATH}:/bin
RUN echo 'Adding new'
RUN echo "conda activate myenv" >> ~/.bashrc

SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"]

# Demonstrate the environment is activated:
RUN echo "Make sure flask is installed:"
RUN python -c "import flask"

RUN echo "Copy service directory"

WORKDIR /service
ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "myenv", "python","worker.py"]

You indicate in the comments that the two images are identical , aside from the final command.您在评论中指出,除了最终命令之外,这两个图像是相同的。 You can override the image's ENTRYPOINT when you run the container using the Compose entrypoint: directive:当您使用 Compose entrypoint:指令运行容器时,您可以覆盖图像的ENTRYPOINT

version: '3.8'
services:
  worker:
    build: .
  web:
    build: .
    entrypoint: conda run --no-capture-output -n myenv python web.py

We can do better than this, though.不过,我们可以做得更好。 An image can have both an ENTRYPOINT and a CMD , and if both are present, Docker combines them into a single command string .图像可以同时具有ENTRYPOINTCMD ,如果两者都存在, 则 Docker 将它们组合成一个命令字符串 A typical setup is to set CMD to a specific command you want to run, and then ENTRYPOINT to some sort of wrapper that accepts that command as additional arguments. In your case, that's exactly the syntax that conda run will accept, so you can say一个典型的设置是将CMD设置为您要运行的特定命令,然后将ENTRYPOINT设置为某种包装器,该包装器接受该命令作为附加 arguments。在您的情况下,这正是 conda conda run将接受的语法,因此您可以说

ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "myenv"]
CMD ["python", "worker.py"]

This can be improved even more by making sure the script is executable通过确保脚本可执行,可以进一步改进这一点

# (run on the host and commit to source control)
chmod +x worker.py

and that it begins with a "shebang" line并且它以“shebang”行开头

#!/usr/bin/env python3
# (must be the very very first line, absolutely nothing before it)

Now you can directly run ./worker.py without naming the python interpreter, or set it in your Dockerfile现在你可以直接运行./worker.py而无需命名python解释器,或者在你的 Dockerfile 中设置它

ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "myenv"]
CMD ["./worker.py"]

This is relevant because you can separately override entrypoint: and command: in docker-compose.yml .这是相关的,因为您可以单独覆盖entrypoint:command:docker-compose.yml中。 Overriding command: replaces the Dockerfile CMD , but leaves the ENTRYPOINT in place.覆盖command:替换 Dockerfile CMD ,但ENTRYPOINT到位。

version: '3.8'
services:
  worker:
    build: .
  web:
    build: .
    command: ./web.py

Technically this builds the image twice, and if you look at docker images output, you will see both "worker" and "web" images.从技术上讲,这会构建图像两次,如果您查看docker images output,您将同时看到“worker”和“web”图像。 However, these images will be identical, even having the same image ID.但是,这些图像将是相同的,甚至具有相同的图像 ID。 The second build should run entirely from the Docker build cache and be extremely quick.第二个构建应该完全从 Docker 构建缓存运行并且非常快。

(In the question you mention an out-of-memory issue. If, as you say in comments, the two images are identical up to the final ENTRYPOINT line, then the second image build should come from the Docker cache and shouldn't require significant memory. The approach I describe here, with the same build: for both container images, could potentially run into the same issue.) (在问题中你提到了内存不足的问题。如果正如你在评论中所说,直到最后的ENTRYPOINT行,这两个图像是相同的,那么第二个图像构建应该来自 Docker 缓存并且不应该要求重要的 memory。我在这里描述的方法,具有相同的build:对于两个容器图像,可能会遇到相同的问题。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 默认情况下,Google Cloud Build 是否在步骤之间保留 docker 张图像? - Does Google Cloud Build keep docker images between steps by default? 在谷歌云构建中的两个容器之间进行通信 - Communicate between two containers in Google cloud build Google Cloud Artifact Registry Docker 具有虚拟大小的图像的存储定价 - Google Cloud Artifact Registry Docker Storage pricing for images with Virtual size docker: 'build' 不是 docker 命令 - docker: 'build' is not a docker command 环境变量的最大大小 - Maximum size for environment variable 在两个谷歌云项目之间复制数据以及以数据为中心的环境中 DEV & PROD 环境的最佳实践 - Copying data between two google cloud projects and best practice for DEV & PROD environment in data centric environment Cloud Build Docker 未构建 - Cloud Build Docker not building Cloud Build 不会将我的 Docker 图像推送到带有 cloudbuild.yaml 图像字段的 Artifact Registry - Cloud Build does not push my Docker image to Artifact Registry with images field in cloudbuild.yaml Azure ACR 任务 API? 有一个应用程序在 docker 容器中运行,需要构建图像并将其推送到 ACR - Azure ACR Tasks API? Have an application running in docker container that needs to to build and push images to ACR 如何安全地与我的团队共享环境变量 - How can I safely share environment variables with my team
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM