简体   繁体   English

tensorflow Docker 时出现 cuDNN 错误

[英]I have cuDNN error when tensorflow Docker

I want to use including and after tensorflow2.0 in Docker.我想在 Docker 中使用包括和之后的 tensorflow2.0。 I want to use ( https://github.com/tensorlayer/srgan ).我想使用( https://github.com/tensorlayer/srgan )。

My Dockerfile is我的 Dockerfile 是

FROM tensorflow/tensorflow:latest-gpu-py3

ENV HOME=/home
ENV user=hogehoge



WORKDIR $HOME

RUN useradd -u 1000 -m -d /home/${user} ${user} \
&& chown -R ${user} /home/${user}

RUN pip install tensorlayer easydict

USER ${USER}

I build the container with:我用以下方法构建容器:

docker build -t tensorflow .
sudo docker run --rm --gpus all -it -v /media/hikarukondo/Workspace/BLUE_TAG/workspace/:/home/ tensorflow

in container,在容器中,

python train.py

And then I get.然后我得到。

2020-01-14 05:39:56.390997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-01-14 05:39:56.392064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-01-14 05:40:00.523011: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-14 05:40:00.542402: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.542772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-01-14 05:40:00.542794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.542831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-14 05:40:00.543925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-14 05:40:00.544139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-14 05:40:00.545110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-14 05:40:00.545615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-14 05:40:00.545639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:00.545738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.546108: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.546413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-01-14 05:40:00.546665: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-14 05:40:00.567683: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
2020-01-14 05:40:00.567909: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5795ae0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-14 05:40:00.567922: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-14 05:40:00.626426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.626828: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5776b10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-14 05:40:00.626856: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-01-14 05:40:00.627044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.627339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-01-14 05:40:00.627360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.627368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-14 05:40:00.627382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-14 05:40:00.627392: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-14 05:40:00.627402: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-14 05:40:00.627412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-14 05:40:00.627419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:00.627460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.627732: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.628005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-01-14 05:40:00.628040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.801827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-14 05:40:00.801853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-01-14 05:40:00.801858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-01-14 05:40:00.802029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.802406: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.802727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6664 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-01-14 05:40:01.135124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:01.604467: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-01-14 05:40:01.609256: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "train.py", line 204, in <module>
    evaluate()
  File "train.py", line 171, in evaluate
    G = get_G([1, None, None, 3])
  File "/home/srgan/model.py", line 14, in get_G
    n = Conv2d(64, (3, 3), (1, 1), act=tf.nn.relu, padding='SAME', W_init=w_init)(nin)
  File "/usr/local/lib/python3.6/dist-packages/tensorlayer/layers/core.py", line 225, in __call__
    outputs = self.forward(input_tensors, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorlayer/layers/convolution/simplified_conv.py", line 271, in forward
    name=self.name,
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 1914, in conv2d_v2
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2011, in conv2d
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 937, in conv2d
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D] name: conv2d_1

Docker version 19.03.5, build I have 1 GeForce RTX 2070 installed and available in my machine. Docker 版本 19.03.5,构建 我在我的机器上安装了 1 个 GeForce RTX 2070 并且可用。 My current driver version is 440.33.01.我当前的驱动程序版本是 440.33.01。

I am wondering if I'm doing something wrong?我想知道我是否做错了什么? Or is there an issue with the Docker build?还是 Docker 构建有问题?

Can you try setting你可以试试设置

config.gpu_options.allow_growth = True config.gpu_options.allow_growth = True

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在对加载的 tensorflow 模型进行预测时,我遇到了一个令人困惑的 cuDNN 错误 - I'm getting a confusing cuDNN error when doing predictions on a loaded tensorflow model tensorflow gpu 测试通过——但我没有安装 cuDNN - tensorflow gpu tests pass--but I don't have cuDNN installed 运行TensorFlow的CudNN错误:无法设置cudnn过滤器描述符:CUDNN_STATUS_BAD_PARAM - CudNN error running TensorFlow: Could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM 错误 - “我 tensorflow/stream_executor/cuda/cuda_dnn.cc:368] 加载 cuDNN 版本 8400 无法加载库 cudnn_cnn_infer64_8.dll。错误代码 193” - error - "I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400 Could not load library cudnn_cnn_infer64_8.dll. Error code 193" 使用 docker 时出现 Postgres 服务器连接错误? - I have Postgres server connection error when I use docker? 无法在 tensorflow-gpu 上使用 GPU:“无法创建 cudnn 句柄:CUDNN_STATUS_INTERNAL_ERROR” - Cannot use GPU on the tensorflow-gpu: "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" Windows 10上Tensorflow GPU版本CNN CuDNN错误 - Tensorflow GPU version CNN CuDNN error on Windows 10 尝试运行TensorFlow时CUDNN_STATUS_NOT_INITIALIZED - CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow 与 tensorflow 的 CuDNN 兼容性错误是什么意思以及如何解决? - What does CuDNN compatibility error with tensorflow mean and how to fix it? 当在链接时间内使用 tensorflow-gpu cudnn 失败时 - When use tensorflow-gpu cudnn fails during the link time
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM