How to speed up the 'Adding visible gpu devices' process in tensorflow with a 30 series card?

Question

I get stuck with that for ~2 minute every time I run the code. Many people on the Internet said that it would only take a long time in the first run, but that's not my case. Although it doesn't make anything go wrong, it's pretty annoying. When I'm stuck, the system is under pretty low usage, including the CPU, system RAM, GPU, video memory. I'm using Nvidia Geforce RTX 3070, Windows 10 x64 20H2.Here's my environment:

# Name                    Version                   Build  Channel
blas                      1.0                         mkl    defaults
boto3                     1.16.47                  pypi_0    pypi
botocore                  1.19.47                  pypi_0    pypi
ca-certificates           2020.12.8            haa95532_0    defaults
certifi                   2020.12.5        py38haa95532_0    defaults
click                     7.1.2                    pypi_0    pypi
cudatoolkit               11.0.221             h74a9793_0    defaults
freetype                  2.10.4               hd328e21_0    defaults
intel-openmp              2020.2                      254    defaults
jmespath                  0.10.0                   pypi_0    pypi
joblib                    1.0.0                    pypi_0    pypi
jpeg                      9b                   hb83a4c4_2    defaults
keras                     2.4.3                    pypi_0    pypi
libpng                    1.6.37               h2a8f88b_0    defaults
libtiff                   4.1.0                h56a325e_1    defaults
libuv                     1.40.0               he774522_0    defaults
lz4-c                     1.9.2                hf4a77e7_3    defaults
mkl                       2020.2                      256    defaults
mkl-service               2.3.0            py38h196d8e1_0    defaults
mkl_fft                   1.2.0            py38h45dec08_0    defaults
mkl_random                1.1.1            py38h47e9c7a_0    defaults
ninja                     1.10.2           py38h6d14046_0    defaults
numpy                     1.19.2           py38hadc3359_0    defaults
numpy-base                1.19.2           py38ha3acd2a_0    defaults
olefile                   0.46                       py_0    defaults
openssl                   1.1.1i               h2bbff1b_0    defaults
pillow                    8.0.1            py38h4fa10fc_0    defaults
pip                       20.3.3           py38haa95532_0    defaults
python                    3.8.5                h5fd99cc_1    defaults
pytorch                   1.7.1           py3.8_cuda110_cudnn8_0    pytorch
regex                     2020.11.13               pypi_0    pypi
s3transfer                0.3.3                    pypi_0    pypi
sacremoses                0.0.43                   pypi_0    pypi
scikit-learn              0.24.0                   pypi_0    pypi
scipy                     1.6.0                    pypi_0    pypi
sentencepiece             0.1.94                   pypi_0    pypi
setuptools                51.0.0           py38haa95532_2    defaults
six                       1.15.0           py38haa95532_0    defaults
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.33.0               h2a8f88b_0    defaults
tb-nightly                2.5.0a20210101           pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
thulac                    0.2.1                    pypi_0    pypi
tk                        8.6.10               he774522_0    defaults
torchaudio                0.7.2                      py38    pytorch
torchvision               0.8.2                py38_cu110    pytorch
transformers              2.1.1                    pypi_0    pypi
typing_extensions         3.7.4.3                    py_0    defaults
vc                        14.2                 h21ff451_1    defaults
vs2015_runtime            14.27.29016          h5e58377_2    defaults
wheel                     0.36.2             pyhd3eb1b0_0    defaults
wincertstore              0.2                      py38_0    defaults
xz                        5.2.5                h62dcd97_0    defaults
zlib                      1.2.11               h62dcd97_4    defaults
zstd                      1.4.5                h04227a9_0    defaults

Although I'm using PyTorch, it's tensorflow rather than PyTorch to blame(according to the logs). I got the same issue with pure tensorflow 2.3

2021-01-03 01:17:50.516100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.622054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-03 01:17:52.645796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.645998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.649575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.649707: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.649827: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.649928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.651954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.660165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.660416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:17:52.660971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-03 01:17:52.668967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x19659fe67d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:17:52.669132: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-03 01:17:52.669395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.669576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.669683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.669790: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.669896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.670072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.670201: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.670365: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:18:37.097681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-03 01:18:37.097876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-03 01:18:37.098025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-03 01:18:37.098301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6591 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-01-03 01:18:37.101296: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1960330d0d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:18:37.101474: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
args:
Namespace(articles_per_title=10, device='0,1,2,3', length=1000, model_config='config/model_config_small.json', model_path='model/final_model', no_wordpiece=False, repetition_penalty=1.0, save_path='generated/', segment=False, temperature=2.0, titles='我', titles_file='', tokenizer_path='cache/vocab_small.txt', topk=10, topp=0)

I noticed that the tensorflow installation guide for GPU users said that GPUs with Ampere architecture may encounter this issue and can solve that by using export CUDA_CACHE_MAXSIZE=2147483648 to expand the default JIT cache. It doesn't work with Windows. I searched my environment variables and none of them named CUDA_CACHE_MAXSIZE . I tried adding that on my own, but it still takes a long time to pass Adding Visible Devices 0 . What should I do?

Answer 1

Just go to Windows Environment Variables and set CUDA_CACHE_MAXSIZE=2147483648 under system variables . And you need a REBOOT ,then everything will be fine.

You are lucky enough to get an Ampere card, since they're out of stock everywhere.

How to speed up the 'Adding visible gpu devices' process in tensorflow with a 30 series card?

Question

1 answers

solution1
0 ACCPTED 2021-01-03 00:37:03

How to speed up the 'Adding visible gpu devices' process in tensorflow with a 30 series card?

Question

1 answers

solution1 0 ACCPTED 2021-01-03 00:37:03

solution1
0 ACCPTED 2021-01-03 00:37:03