简体   繁体   中英

Problem with tensorflow 2.0 gpu, unknown error

I'm trying to run tensorflow-gpu 2.0 on Windows 10 in a conda environment, the code is actually the basic tutorial on TensorFlow page

from __future__ import absolute_import, division, print_function, unicode_literals 
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test,  y_test, verbose=2)

and I don't understand the error and have already uninstalled and installed again could it be that I have not installed yet keras-gpu?, I am just getting started with this library pls help :(

Epoch 1/5
2020-01-24 23:40:35.430377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-01-24 23:40:35.923375: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.933612: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.941088: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.952234: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.961783: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.970378: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.976378: W tensorflow/stream_executor/stream.cc:1919] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-01-24 23:40:35.986426: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[{{node sequential/dense/MatMul}}]]
   32/60000 [..............................] - ETA: 2:37:06Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
    ctx=ctx)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError:  Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[node sequential/dense/MatMul (defined at C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_distributed_function_706]

Function call stack:
distributed_function

>>>
>>> model.evaluate(x_test,  y_test, verbose=2)
2020-01-24 23:40:36.878248: I tensorflow/stream_executor/stream.cc:1868] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not wait for [stream=000002DA3ACFD9A0,impl=000002DA3B9C7F70]
2020-01-24 23:40:36.892612: I tensorflow/stream_executor/stream.cc:4816] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not memcpy host-to-device; source: 000002DAA3AF8C80
2020-01-24 23:40:36.901014: F tensorflow/core/common_runtime/gpu/gpu_util.cc:342] CPU->GPU Memcpy failed```


Igor, are you setting the GPU device?

https://devblogs.nvidia.com/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/

https://www.tensorflow.org/guide/gpu

see what devices you have:

from \__future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))





## it's possible to set the device manually

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    print(c)

so your code would be something like:

with tf.device('/CPU:0'):
    mnist = tf.keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
        ])

    model.compile(optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=5)

    model.evaluate(x_test,  y_test, verbose=2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM