简体   繁体   中英

Blas GEMM launch failed when using TensorFlow GPU with Keras

Pretty self-explanatory. Like countless people before and probably after me, I have a Blas GEMM launch failed error message when trying to call model.fit() .

This is the output of nvidia-smi before calling model.compile() :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   45C    P0    74W / 149W |      0MiB / 11441MiB |    100%      Default |   <<<--- 0% Memory usage
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |   <<<--- nothing running
+-----------------------------------------------------------------------------+

And the output of nvidia-smi after calling model.compile() (and immediately before model.fit() ):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   45C    P0    72W / 149W |  10942MiB / 11441MiB |      0%      Default |   <<<--- 96% Memory usage
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1811      C   /usr/bin/python3                           10929MiB |   <<<--- TF model here
+-----------------------------------------------------------------------------+

It looks like the compiled TensorFlow model monopolises 96% of the GPU memory. I have no idea whether this is normal or not, and whether it could be the cause of the later error when trying to train the model.

The error message itself is the following:

tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support

InternalError: Blas GEMM launch failed : a.shape=(32, 116032), b.shape=(116032, 256), m=32, n=256, k=116032 [[node dense_1/MatMul (defined at /home/ubuntu/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_1645]

Function call stack: keras_scratch_graph

Output of tf.config.experimental.list_physical_devices() :

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
 PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

The model is built using the following:

  • Keras 2.3.1 (using keras.models.Sequential )
  • TensorFlow-GPU 2.1.0
  • CUDA 10.1
  • cuDNN 7.6.4
  • Ubuntu 18.04
  • AWS p2.xlarge instance (featuring a Tesla K80 GPU)

I've gone through countless GitHub issues, blog posts, SO questions, all recommending to make sure that no previously running process is still active on the GPU when starting a new one, or to add CUPTI location to LD_LIBRARY_PATH, or to use various TF options... None of it solved the issue. Any idea of what causes that and how to solve would be appreciated.

I had the same problem. I saw many answer and used many suggested code to solve this problem but anything help me.

For me the problem was the usage of GPU so I limit the memory used by my GPU with the following code:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
    try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Virtual devices must be set before GPUs have been initialized
        print(e)

take fromhttps://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth . This had resolve my problem. I hope this will resolve your problem too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM