I have a problem with running the RNN Guide from TensorFlow . I am on Ubuntu 18.04.3 and I've installed TensorFlow with GPU support through Anaconda3. When I run a code as simple as this:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import os
import time
# DATASET
path_to_file = './shakespeare.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
vocab = sorted(set(text))
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
seq_length = 100
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
BATCH_SIZE = 50 #64
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
# MODEL
vocab_size = len(vocab) #65
embedding_dim = 256
rnn_units = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
model = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
for input_example_batch, target_example_batch in dataset.take(1):
a = model(input_example_batch)
print(a)
I get this result:
2019-11-22 12:26:38.175152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-22 12:26:38.183936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.184486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 930M major: 5 minor: 0 memoryClockRate(GHz): 0.941
pciBusID: 0000:01:00.0
2019-11-22 12:26:38.201683: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-22 12:26:38.217994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-22 12:26:38.226954: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-11-22 12:26:38.249912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-11-22 12:26:38.266853: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-11-22 12:26:38.283398: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-11-22 12:26:38.305535: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-22 12:26:38.305704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.306261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.306761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-22 12:26:38.307056: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-22 12:26:38.328890: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2594080000 Hz
2019-11-22 12:26:38.329724: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de21a15ca0 executing computations on platform Host. Devices:
2019-11-22 12:26:38.329751: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-11-22 12:26:38.365863: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.366560: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de21a17b00 executing computations on platform CUDA. Devices:
2019-11-22 12:26:38.366589: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce 930M, Compute Capability 5.0
2019-11-22 12:26:38.366725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.367183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 930M major: 5 minor: 0 memoryClockRate(GHz): 0.941
pciBusID: 0000:01:00.0
2019-11-22 12:26:38.367210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-22 12:26:38.367221: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-22 12:26:38.367230: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-11-22 12:26:38.367239: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-11-22 12:26:38.367248: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-11-22 12:26:38.367257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-11-22 12:26:38.367266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-22 12:26:38.367314: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.367786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.368220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-22 12:26:38.368245: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-22 12:26:38.368989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-22 12:26:38.369001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-11-22 12:26:38.369006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-11-22 12:26:38.369111: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.369593: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-22 12:26:38.370053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1422 MB memory) -> physical GPU (device: 0, name: GeForce 930M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-11-22 12:26:40.754438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-22 12:26:41.456357: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-22 12:26:41.816442: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-22 12:26:41.820286: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-22 12:26:41.820332: W tensorflow/stream_executor/stream.cc:1919] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "high_level_GRU_1.py", line 56, in <module>
a = model(input_example_batch)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 256, in call
return super(Sequential, self).call(inputs, training=training, mask=mask)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 708, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 860, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/core.py", line 1045, in call
outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4077, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2765, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/okami/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6126, in mat_mul
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(5000, 1024), b.shape=(1024, 65), m=5000, n=65, k=1024 [Op:MatMul] name: sequential/dense/Tensordot/MatMul/
My nvidia-smi output:
Fri Nov 22 12:34:23 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 930M On | 00000000:01:00.0 Off | N/A |
| N/A 40C P5 N/A / N/A | 350MiB / 2004MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1037 G /usr/lib/xorg/Xorg 27MiB |
| 0 1239 G /usr/bin/gnome-shell 48MiB |
| 0 2706 G /usr/lib/xorg/Xorg 115MiB |
| 0 2923 G /usr/bin/gnome-shell 74MiB |
| 0 17495 G ...quest-channel-token=6887620846227821225 78MiB |
| 0 19795 G gnome-control-center 1MiB |
+-----------------------------------------------------------------------------+
I tried to google my problem, I've found this:
But I dont' have tensorflow_backend.py anywhere in my ~/anaconda3/... directory. Can I fix this without reinstall TensorFlow? Thanks
I had a similar problem and got it working.
I was using: Tensorflow 2.0 CUDA 10.2 cudnn 7.6.5
This configuration kept throwing the same "Blas GEMM Launch Failed" error like yours. After trying most of the workarounds (pip install tf-nightly, dump cache, "I had tensorflow_backend.py in my 'keras\backend' folder but the code snippet from the link did not exist", etc.), I just decided to uninstall CUDA 10.2 and then downloaded/installed CUDA 10.0 and it no longer showed the error. From the test models that I ran, GPU was being utilized and everything seems fine so far (it's only been 20 minutes... fingers crossed).
In case you have questions about cudnn installations, refer to the below link. It's very manual. https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
Conclusion: downgrade your CUDA 10.1 to CUDA 10.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.