I get stuck with that for ~2 minute every time I run the code. Many people on the Internet said that it would only take a long time in the first run, but that's not my case. Although it doesn't make anything go wrong, it's pretty annoying. When I'm stuck, the system is under pretty low usage, including the CPU, system RAM, GPU, video memory. I'm using Nvidia Geforce RTX 3070, Windows 10 x64 20H2.Here's my environment:
# Name Version Build Channel
blas 1.0 mkl defaults
boto3 1.16.47 pypi_0 pypi
botocore 1.19.47 pypi_0 pypi
ca-certificates 2020.12.8 haa95532_0 defaults
certifi 2020.12.5 py38haa95532_0 defaults
click 7.1.2 pypi_0 pypi
cudatoolkit 11.0.221 h74a9793_0 defaults
freetype 2.10.4 hd328e21_0 defaults
intel-openmp 2020.2 254 defaults
jmespath 0.10.0 pypi_0 pypi
joblib 1.0.0 pypi_0 pypi
jpeg 9b hb83a4c4_2 defaults
keras 2.4.3 pypi_0 pypi
libpng 1.6.37 h2a8f88b_0 defaults
libtiff 4.1.0 h56a325e_1 defaults
libuv 1.40.0 he774522_0 defaults
lz4-c 1.9.2 hf4a77e7_3 defaults
mkl 2020.2 256 defaults
mkl-service 2.3.0 py38h196d8e1_0 defaults
mkl_fft 1.2.0 py38h45dec08_0 defaults
mkl_random 1.1.1 py38h47e9c7a_0 defaults
ninja 1.10.2 py38h6d14046_0 defaults
numpy 1.19.2 py38hadc3359_0 defaults
numpy-base 1.19.2 py38ha3acd2a_0 defaults
olefile 0.46 py_0 defaults
openssl 1.1.1i h2bbff1b_0 defaults
pillow 8.0.1 py38h4fa10fc_0 defaults
pip 20.3.3 py38haa95532_0 defaults
python 3.8.5 h5fd99cc_1 defaults
pytorch 1.7.1 py3.8_cuda110_cudnn8_0 pytorch
regex 2020.11.13 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.43 pypi_0 pypi
scikit-learn 0.24.0 pypi_0 pypi
scipy 1.6.0 pypi_0 pypi
sentencepiece 0.1.94 pypi_0 pypi
setuptools 51.0.0 py38haa95532_2 defaults
six 1.15.0 py38haa95532_0 defaults
sklearn 0.0 pypi_0 pypi
sqlite 3.33.0 h2a8f88b_0 defaults
tb-nightly 2.5.0a20210101 pypi_0 pypi
threadpoolctl 2.1.0 pypi_0 pypi
thulac 0.2.1 pypi_0 pypi
tk 8.6.10 he774522_0 defaults
torchaudio 0.7.2 py38 pytorch
torchvision 0.8.2 py38_cu110 pytorch
transformers 2.1.1 pypi_0 pypi
typing_extensions 3.7.4.3 py_0 defaults
vc 14.2 h21ff451_1 defaults
vs2015_runtime 14.27.29016 h5e58377_2 defaults
wheel 0.36.2 pyhd3eb1b0_0 defaults
wincertstore 0.2 py38_0 defaults
xz 5.2.5 h62dcd97_0 defaults
zlib 1.2.11 h62dcd97_4 defaults
zstd 1.4.5 h04227a9_0 defaults
Although I'm using PyTorch, it's tensorflow rather than PyTorch to blame(according to the logs). I got the same issue with pure tensorflow 2.3
2021-01-03 01:17:50.516100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.622054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-03 01:17:52.645796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.645998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.649575: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.649707: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.649827: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.649928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.651954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.660165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.660416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:17:52.660971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-03 01:17:52.668967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x19659fe67d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:17:52.669132: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-01-03 01:17:52.669395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-03 01:17:52.669576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-03 01:17:52.669683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-03 01:17:52.669790: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-03 01:17:52.669896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-03 01:17:52.670072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-03 01:17:52.670201: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-03 01:17:52.670365: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-03 01:17:52.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-03 01:18:37.097681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-03 01:18:37.097876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-01-03 01:18:37.098025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-01-03 01:18:37.098301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6591 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-01-03 01:18:37.101296: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1960330d0d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-03 01:18:37.101474: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
args:
Namespace(articles_per_title=10, device='0,1,2,3', length=1000, model_config='config/model_config_small.json', model_path='model/final_model', no_wordpiece=False, repetition_penalty=1.0, save_path='generated/', segment=False, temperature=2.0, titles='我', titles_file='', tokenizer_path='cache/vocab_small.txt', topk=10, topp=0)
I noticed that the tensorflow installation guide for GPU users said that GPUs with Ampere architecture may encounter this issue and can solve that by using export CUDA_CACHE_MAXSIZE=2147483648
to expand the default JIT cache. It doesn't work with Windows. I searched my environment variables and none of them named CUDA_CACHE_MAXSIZE
. I tried adding that on my own, but it still takes a long time to pass Adding Visible Devices 0
. What should I do?
Just go to Windows Environment Variables
and set CUDA_CACHE_MAXSIZE=2147483648
under system variables
. And you need a REBOOT ,then everything will be fine.
You are lucky enough to get an Ampere card, since they're out of stock everywhere.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.