简体   繁体   English

如何在 Docker 容器中运行的 Jupyter Notebook 中安装包

[英]How to install packages in Jupyter Notebook running in Docker Container

I've tried to set up PySpark on Windows 10. After some various challenges, I've decided to use Docker Image instead, and it worked great.我尝试在 Windows 10 上设置 PySpark。经过各种挑战后,我决定改用 Docker Image,效果很好。

The hello world script is working. hello world脚本正在运行。 However, I'm not able to install any packages on Jupyter powered by Docker.但是,我无法在由 Docker 提供支持的 Jupyter 上安装任何软件包。 Please advise.请指教。

Normally, I can use the code below on Anaconda terminal:通常,我可以在 Anaconda 终端上使用以下代码:

Issue :问题

The following command must be run outside the IPython shell:

    $ pip install fastavro

I cannot find how to install INSIDE docker.我找不到如何安装 INSIDE docker。 Please advise.请指教。

Resources:资源:

  • Docker image - jupyter/pyspark-notebook Docker 镜像 - jupyter/pyspark-notebook
  • Operating System - Windows 10操作系统 - Windows 10

In Jupyter cell/IPython shell, you can run:在 Jupyter cell/IPython shell 中,您可以运行:

!pip install PACKAGENAME 

To install package(s).安装软件包。 Note the '!'注意“!” Prefix.字首。

Update更新

When having multiple environment, in use the system executor(Python) used in that environment.当有多个环境时,使用该环境中使用的系统执行器(Python)。

import sys

!{sys.executable} -m pip install PACKAGENAME

It would be reasonable to save an updated container, so you don't need to install those packages each time.保存更新的容器是合理的,因此您不需要每次都安装这些包。 One way to do it is to build your own image.一种方法是建立自己的形象。 Let's say you want to use the jupyter/datascience-notebook image from jupyter docker stack .假设您想使用 jupyter docker stack中的jupyter/datascience-notebook映像。 First, you need to create the file Dockerfile (without extension).首先,您需要创建文件Dockerfile (无扩展名)。 This file should contain the following instructions:该文件应包含以下说明:

# Start from a core stack version
FROM jupyter/datascience-notebook:6b49f3337709
# Install in the default python3 environment
RUN pip install --quiet --no-cache-dir 'flake8==3.9.2' && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

Instead of pip, you can use conda or mamba:您可以使用 conda 或 mamba 代替 pip:

# install a package into the default (python 3.x) environment and cleanup after
# the installation
mamba install --quiet --yes some-package && \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

conda install --quiet --yes some-package && \
    conda clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

Then you need to go to the directory with your newly created Dockerfile and run:然后您需要使用新创建的 Dockerfile 进入目录并运行:

$ docker image build --tag jupyter/base-notebook:my_version .

where --tag is the name of your image that has the following structure repository name:tag name .其中--tag是具有以下结构的图像的名称repository name:tag name And don't forget about the single dot .并且不要忘记单点. (path to Dockerfile) at the end! (Dockerfile 的路径)最后!

When docker finished building the image, you can find it in the docker images list using docker image ls :当 docker 完成构建镜像后,您可以使用docker image ls在 docker 镜像列表中找到它:

REPOSITORY                           TAG               IMAGE ID       CREATED             SIZE
jupyter/base-notebook                my_version        3cf0f4683b46   11 minutes ago      1.12GB

Now you can use your newly create image with installed packages:现在您可以将新创建​​的映像与已安装的软件包一起使用:

$ docker run -p 8888:8888 jupyter/base-notebook:my_version

Another way to save a modified image is to use docker commit command.保存修改后的图像的另一种方法是使用docker commit命令。 You can install desired packages directly in jupyter notebook and then save changes using:您可以直接在 jupyter notebook 中安装所需的软件包,然后使用以下命令保存更改:

$ docker commit CONTAINER_ID  jupyter/base-notebook:my_version

CONTAINER_ID you can find using docker ps command that lists running containers.您可以使用列出正在运行的容器的docker ps命令找到CONTAINER_ID

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM