[英]I have cuDNN error when tensorflow Docker
I want to use including and after tensorflow2.0 in Docker.我想在 Docker 中使用包括和之后的 tensorflow2.0。 I want to use ( https://github.com/tensorlayer/srgan ).
我想使用( https://github.com/tensorlayer/srgan )。
My Dockerfile is我的 Dockerfile 是
FROM tensorflow/tensorflow:latest-gpu-py3
ENV HOME=/home
ENV user=hogehoge
WORKDIR $HOME
RUN useradd -u 1000 -m -d /home/${user} ${user} \
&& chown -R ${user} /home/${user}
RUN pip install tensorlayer easydict
USER ${USER}
I build the container with:我用以下方法构建容器:
docker build -t tensorflow .
sudo docker run --rm --gpus all -it -v /media/hikarukondo/Workspace/BLUE_TAG/workspace/:/home/ tensorflow
in container,在容器中,
python train.py
And then I get.然后我得到。
2020-01-14 05:39:56.390997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-01-14 05:39:56.392064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-01-14 05:40:00.523011: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-14 05:40:00.542402: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.542772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-01-14 05:40:00.542794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.542831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-14 05:40:00.543925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-14 05:40:00.544139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-14 05:40:00.545110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-14 05:40:00.545615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-14 05:40:00.545639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:00.545738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.546108: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.546413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-01-14 05:40:00.546665: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-14 05:40:00.567683: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
2020-01-14 05:40:00.567909: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5795ae0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-14 05:40:00.567922: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-01-14 05:40:00.626426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.626828: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5776b10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-14 05:40:00.626856: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-01-14 05:40:00.627044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.627339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-01-14 05:40:00.627360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.627368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-14 05:40:00.627382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-14 05:40:00.627392: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-14 05:40:00.627402: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-14 05:40:00.627412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-14 05:40:00.627419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:00.627460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.627732: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.628005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-01-14 05:40:00.628040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-14 05:40:00.801827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-14 05:40:00.801853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-01-14 05:40:00.801858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-01-14 05:40:00.802029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.802406: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-14 05:40:00.802727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6664 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-01-14 05:40:01.135124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-14 05:40:01.604467: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-01-14 05:40:01.609256: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "train.py", line 204, in <module>
evaluate()
File "train.py", line 171, in evaluate
G = get_G([1, None, None, 3])
File "/home/srgan/model.py", line 14, in get_G
n = Conv2d(64, (3, 3), (1, 1), act=tf.nn.relu, padding='SAME', W_init=w_init)(nin)
File "/usr/local/lib/python3.6/dist-packages/tensorlayer/layers/core.py", line 225, in __call__
outputs = self.forward(input_tensors, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorlayer/layers/convolution/simplified_conv.py", line 271, in forward
name=self.name,
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 1914, in conv2d_v2
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2011, in conv2d
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 937, in conv2d
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D] name: conv2d_1
Docker version 19.03.5, build I have 1 GeForce RTX 2070 installed and available in my machine. Docker 版本 19.03.5,构建 我在我的机器上安装了 1 个 GeForce RTX 2070 并且可用。 My current driver version is 440.33.01.
我当前的驱动程序版本是 440.33.01。
I am wondering if I'm doing something wrong?我想知道我是否做错了什么? Or is there an issue with the Docker build?
还是 Docker 构建有问题?
Can you try setting你可以试试设置
config.gpu_options.allow_growth = True
config.gpu_options.allow_growth = True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.