[英]Why stealthy Tensorflow upgrade to 2.9.1 breaks Google Colab Jupyter notebooks that use GPU but not CPU?
在过去三天的某个时间,Google colab 上的 TensorFlow 从 2.8.x 升级到了 2.9.1。 这次升级打破了我目前所有的研究笔记本,包括我包含的一个最小的 MNIST 示例。 对发行说明的全面审查并没有显示我在 Keras 或 TensorFlow 中使用的任何软件包都已更改。
对此错误的进一步研究表明,它仅在 Colab 运行时包含 GPU 时发生。 它在 Colab CPU 或 TPU 上运行良好。 这是一个重现错误的 34 行示例:
import tensorflow as tf
import keras
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape ( x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test , 10)
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Dropout(0.25))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=100, epochs=1, verbose=1, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0], 'Test accuracy:', score[1])
我在下面列出了错误的痕迹。
当我降级回 tensorflow 2.8.2 时,错误消失了,我所有的协作笔记本都正常工作。
要继续使用 GPU,当前的解决方法是:每次运行增加 86 秒:
!pip install tensorflow==2.8.2
import tensorflow as tf
print(tf.__version__)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
---------------------------------------------------------------------------
UnimplementedError Traceback (most recent call last)
[<ipython-input-1-05f207168698>](https://localhost:8080/#) in <module>
31 metrics=['accuracy'])
32
---> 33 model.fit(x_train, y_train, batch_size=100, epochs=1, verbose=1, validation_data=(x_test, y_test))
34 score = model.evaluate(x_test, y_test, verbose=0)
35 print('Test loss:', score[0], 'Test accuracy:', score[1])
1 frames
[/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py](https://localhost:8080/#) in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 ctx.ensure_initialized()
54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
UnimplementedError: Graph execution error:
[...]
Node: 'sequential/conv2d/Conv2D'
DNN library is not found.
[[{{node sequential/conv2d/Conv2D}}]] [Op:__inference_train_function_865]
Google Colab 已通过降级 Tensorflow 的默认版本“解决”了该问题。
import tensorflow as tf
print(tf.__version__)
现在为 GPU 和 CPU 运行时输出 2.8.2。
事实上,现在您发布的代码不再产生错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.