繁体   English   中英

为什么隐秘的 Tensorflow 升级到 2.9.1 会破坏使用 GPU 而不是 CPU 的 Google Colab Jupyter 笔记本?

[英]Why stealthy Tensorflow upgrade to 2.9.1 breaks Google Colab Jupyter notebooks that use GPU but not CPU?

在过去三天的某个时间,Google colab 上的 TensorFlow 从 2.8.x 升级到了 2.9.1。 这次升级打破了我目前所有的研究笔记本,包括我包含的一个最小的 MNIST 示例。 对发行说明的全面审查并没有显示我在 Keras 或 TensorFlow 中使用的任何软件包都已更改。

对此错误的进一步研究表明,它仅在 Colab 运行时包含 GPU 时发生。 它在 Colab CPU 或 TPU 上运行良好。 这是一个重现错误的 34 行示例:


    import tensorflow as tf
    import keras
    
    # the data, split between train and test sets
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
    x_test  = x_test.reshape ( x_test.shape[0], 28, 28, 1)
    input_shape = (28, 28, 1)
    
    x_train  = x_train.astype('float32')
    x_test   = x_test.astype('float32')
    x_train /= 255
    x_test  /= 255
    
    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test  = keras.utils.to_categorical(y_test , 10)
    
    model = keras.models.Sequential()
    model.add(keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(keras.layers.Dropout(0.25))
    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(128, activation='relu'))
    model.add(keras.layers.Dropout(0.5))
    model.add(keras.layers.Dense(10, activation='softmax'))
    
    model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),
                  metrics=['accuracy'])
    
    model.fit(x_train, y_train, batch_size=100, epochs=1, verbose=1, validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test loss:', score[0], 'Test accuracy:', score[1])

我在下面列出了错误的痕迹。

当我降级回 tensorflow 2.8.2 时,错误消失了,我所有的协作笔记本都正常工作。

要继续使用 GPU,当前的解决方法是:每次运行增加 86 秒:


    !pip install tensorflow==2.8.2
    import tensorflow as tf
    print(tf.__version__)

  • 运行时配置中包含 GPU 时的错误日志:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
---------------------------------------------------------------------------
UnimplementedError                        Traceback (most recent call last)
[<ipython-input-1-05f207168698>](https://localhost:8080/#) in <module>
     31               metrics=['accuracy'])
     32 
---> 33 model.fit(x_train, y_train, batch_size=100, epochs=1, verbose=1, validation_data=(x_test, y_test))
     34 score = model.evaluate(x_test, y_test, verbose=0)
     35 print('Test loss:', score[0], 'Test accuracy:', score[1])

1 frames
[/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py](https://localhost:8080/#) in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

UnimplementedError: Graph execution error:

    [...]

Node: 'sequential/conv2d/Conv2D'
DNN library is not found.
     [[{{node sequential/conv2d/Conv2D}}]] [Op:__inference_train_function_865]

Google Colab 已通过降级 Tensorflow 的默认版本“解决”了该问题。

import tensorflow as tf
print(tf.__version__) 

现在为 GPU 和 CPU 运行时输出 2.8.2。

事实上,现在您发布的代码不再产生错误。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM