[英]Pycharm debugging using docker with GPUs
To debug a Python application in PyCharm, where I set the interpreter to a custom docker image, using Tensorflow and so requiring a GPU. The problem is that PyCharm's command-building doesn't offer a way to discover available GPUs, as far as I can tell.要在 PyCharm 中调试 Python 应用程序,我将解释器设置为自定义 docker 图像,使用 Tensorflow,因此需要 GPU。问题是 PyCharm 的命令构建不提供发现可用 GPU 的方法,据我所知可以告诉。
Enter a container with the following command, specifying which GPUs to make available ( --gpus
):使用以下命令输入一个容器,指定哪些 GPU 可用 (
--gpus
):
docker run -it --rm --gpus=all --entrypoint="/bin/bash" 3b6d609a5189 # image has an entrypoint, so I overwrite it
Inside the container, I can run nvidia-smi
to see a GPU is found, and confirm Tensorflow finds it, using:在容器内,我可以运行
nvidia-smi
以查看找到 GPU,并确认 Tensorflow 找到它,使用:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
# physical_device_desc: "device: 0, name: Quadro P2000, pci bus id: 0000:01:00.0, compute capability: 6.1"]
If I don't use the --gpus
flag, no GPUs are discovered, as expected.如果我不使用
--gpus
标志,则不会像预期的那样发现任何 GPU。 Note: using docker version 19.03 and above, Nvidia runtimes are supports natively, so there is no need for nvidia-docker
and also, the docker-run argument --runtime=nvidia
is also deprecated.注意:使用 docker 版本 19.03 及更高版本,Nvidia 运行时是本机支持的,因此不需要
nvidia-docker
docker,而且 docker-run 参数--runtime=nvidia
也已弃用。 Relevant thread .相关线程。
Here is the configuration for the run:这是运行的配置:
(I realise some of those paths might look incorrect, but that isn't an issue for now) (我意识到其中一些路径可能看起来不正确,但现在这不是问题)
I set the interpreter to point to the same docker image and run the Python script, set a custom LD_LIBRARY_PATH
as an argument to the run that matches where the libcuda.so
is locate
d in the docker image (I found it interactively inside a running container), but still no device is found:我将解释器设置为指向相同的 docker 图像并运行 Python 脚本,将自定义
LD_LIBRARY_PATH
设置为与libcuda.so
在locate
图像中的位置匹配的运行参数(我在运行的容器中以交互方式找到它), 但仍然没有找到设备:
The error message shows the the CUDA library was able to be loaded (ie is was found on that LD_LIBRARY_PATH
), but the device was still not found.错误消息显示可以加载 CUDA 库(即在
LD_LIBRARY_PATH
上找到),但仍未找到设备。 This is why I believe the docker run argument --gpus=all
must be set somewhere.这就是为什么我相信 docker 运行参数
--gpus=all
必须在某处设置。 I can't find a way to do that in PyCharm.我在 PyCharm 中找不到这样做的方法。
--gpus=all
, but that seems not to be supported by the parser of those options:--gpus=all
,但这些选项的解析器似乎不支持:nvidia
in the docker daemon by including the following config in /etc/docker/daemon.json
:/etc/docker/daemon.json
中包含以下配置,将默认运行时设置为 docker 守护程序中的nvidia
:{ "runtimes": { "nvidia": { "runtimeArgs": ["gpus=all"] } } }
I am not sure of the correct format for this, however.但是,我不确定正确的格式。 I have tried a few variants of the above, but nothing got the GPUs recognised.
我已经尝试了上述的几个变体,但没有任何一个可以识别 GPU。 The example above could at least be parsed and allow me to restart the docker daemon without errors.
上面的示例至少可以被解析并允许我重新启动 docker 守护程序而不会出现错误。
I noticed in the official Tensorflow docker images, they install a package (via apt install
) called nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0
, which sounds like a great tool, albeit seemingly just for TensorRT.我在官方 Tensorflow docker 图像中注意到,他们安装了一个名为
nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0
的 package(通过apt install
),这听起来像是一个很棒的工具,尽管看起来只是为了张量RT。 I added it to my Dockerfile as a shot in the dark, but unfortunately it did not fix the issue.我把它添加到我的 Dockerfile 作为黑暗中的一枪,但不幸的是它没有解决问题。
Adding NVIDIA_VISIBLE_DEVICES=all
etc. to the environment variables of the PyCharm configuration, with no luck.将
NVIDIA_VISIBLE_DEVICES=all
等添加到 PyCharm 配置的环境变量中,但没有成功。
I am using Python 3.6, PyCharm Professional 2019.3 and Docker 19.03.我正在使用 Python 3.6、PyCharm Professional 2019.3 和 Docker 19.03。
Docker GPUs' support is now available in PyCharm 2020.2 without global default-runtime
. Docker GPU 的支持现在在 PyCharm 2020.2 中可用,没有 global
default-runtime
。 Just set --gpus all
under 'Docker container settings' section in the configuration window .只需在配置窗口的“Docker 容器设置”部分下设置
--gpus all
。
If no NVIDIA GPU device is present: /dev/nvidia0 does not exist
error still occur, make sure to uncheck Run with Python Console
, because it's still not working properly.如果
no NVIDIA GPU device is present: /dev/nvidia0 does not exist
错误仍然发生,请确保取消选中Run with Python Console
,因为它仍然无法正常工作。
It turns out that attempt 2. in the "Other things I tried" section of my post was the right direction, and using the following allowed PyCharm's remote interpreter (the docker image) locate the GPU, as the Terminal was able to.事实证明,在我的帖子的“我尝试过的其他事情”部分中的尝试 2.是正确的方向,并且使用以下允许 PyCharm 的远程解释器(docker 图像)定位 GPU,因为终端能够。
I added the following into /etc/docker/daemon.json
:我将以下内容添加到
/etc/docker/daemon.json
:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
It is also necessary to restart the docker service after saving the file:保存文件后还需要重启docker服务:
sudo service docker restart
Note: that kills all running docker containers on the system注意:这会杀死系统上所有正在运行的 docker 容器
Check out Michał De's answer, it works.查看 Michał De 的回答,它有效。 However, an interactive console is still broken.
但是,交互式控制台仍然损坏。 With some
docker inspect
I figured out that using the option Run with Python Console
overwrites docker config ignoring provided options --gpus all
.通过一些
docker inspect
,我发现使用选项Run with Python Console
会覆盖 docker 配置,忽略提供的选项--gpus all
。 I couldn't stand such a loss in quality of life and forced pycharm to play nice using docker-compose
.我无法忍受这样的生活质量损失,并强迫 pycharm 使用
docker-compose
玩得很好。
Behold, the WORKAROUND.看哪,解决方法。
1. How to test GPU in Tensorflow 1.如何在Tensorflow中测试GPU
import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))
should return something like应该返回类似的东西
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2. Make sure you have a simple docker container that works 2. 确保你有一个简单的 docker 容器可以工作
docker pull tensorflow/tensorflow:latest-gpu-jupyter
docker run --gpus all -it tensorflow/tensorflow:latest-gpu-jupyter python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
the last print should be as described in step 1. Otherwise see nvidia guide or tensorflow guide .最后一次打印应如步骤 1 中所述。否则请参阅nvidia 指南或tensorflow 指南。
3. Create a compose file and test it 3.创建一个compose文件并测试它
version: '3'
# ^ fixes another pycharm bug
services:
test:
image: tensorflow/tensorflow:latest-gpu-jupyter
# ^ or your own
command: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# ^ irrelevant, will be overwridden by pycharm, but usefull for testing
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
docker-compose --file your_compose_file up
Again, you should see the same output as described in the step 1. Given that step 2 was successful this should go without surprises.同样,您应该看到与步骤 1 中描述的相同的 output。如果步骤 2 成功,这应该是 go,这不会令人意外。
4. Set up this compose as an interpreter in pycharm 4.在pycharm中设置这个compose为解释器
5. Enjoy your interactive console while running a GPU enabled docker. 5. 在运行启用了 GPU 的 docker 的同时享受您的交互式控制台。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.