[英]Keras tuner and TPU in Google Colab
I have some problems with keras tuner and tpu.我在使用 keras 调谐器和 tpu 时遇到一些问题。 When I run the code below, everything works well and.network training is fast.
当我运行下面的代码时,一切正常,网络训练速度很快。
vocab_size = 5000
embedding_dim = 64
max_length = 2000
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim),
tf.keras.layers.LSTM(100, dropout=0.5, recurrent_dropout=0.5),
tf.keras.layers.Dense(embedding_dim, activation='relu'),
tf.keras.layers.Dense(4, activation='softmax')
])
return model
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
model = create_model()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
model.fit(train_padded, y_train,
epochs=10,
validation_split=0.15,
verbose=1, batch_size=128)
When I use a keras tuner, the neural.network learns slowly.当我使用 keras 调谐器时,神经网络学习缓慢。 I believe that TPU is not used.
我相信没有使用TPU。
vocab_size = 5000
max_length = 2000
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
def build_model(hp):
model = tf.keras.Sequential()
activation_choice = hp.Choice('activation', values=['relu', 'sigmoid', 'tanh', 'elu', 'selu'])
embedding_dim = hp.Int('units_hidden', min_value=128, max_value=24, step=8)
model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim))
model.add(tf.keras.layers.LSTM(hp.Int('LSTM_Units', min_value=50, max_value=500, step=10),
dropout=hp.Float('dropout', 0, 0.5, step=0.1, default=0),
recurrent_dropout=hp.Float('recurrent_dropout', 0, 0.5, step=0.1, default=0)))
model.add(tf.keras.layers.Dense(embedding_dim, activation=activation_choice))
model.add(tf.keras.layers.Dense(4, activation='softmax'))
model.compile(
optimizer=hp.Choice('optimizer', values=['adam', 'rmsprop', 'SGD']),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
return model
with strategy.scope():
tuner = Hyperband(
build_model,
objective='val_accuracy',
max_epochs=10,
hyperband_iterations=2)
tuner.search(train_padded, y_train,
batch_size=128,
epochs=10,
callbacks=[EarlyStopping(patience=1)],
validation_split=0.15,
verbose=1)
best_models = tuner.get_best_models(1)
best_model.save('/content/drive/My Drive/best_model.h5')
How to make a keras tuner work with TPU?如何让 keras 调谐器与 TPU 一起工作?
You need to pass it to the tuner:您需要将其传递给调谐器:
tuner = Hyperband(
build_model,
objective='val_accuracy',
max_epochs=10,
hyperband_iterations=2,
distribution_strategy=strategy,)
(and remove the strategy.scope() part) (并删除 strategy.scope() 部分)
To add...加上...
I don't use Google Colab, but Kaggle.我不使用 Google Colab,而是使用 Kaggle。 Using TPU, I get that same error "File system scheme '[local]' not implemented", when the tuner tries to write the checkpoints on Kaggle's working directory.
使用 TPU,当调谐器尝试在 Kaggle 的工作目录上写入检查点时,我得到了同样的错误“文件系统方案‘[本地]’未实现”。
Since I don't have a gs://location, I just "modified" the function called by Keras Tuner to save checkpoints, to allow writing to local dir, which is the Kaggle working directory.由于我没有 gs://location,我只是“修改”了 Keras Tuner 调用的 function 来保存检查点,以允许写入本地目录,这是 Kaggle 的工作目录。 I used patch() to mock the function.
我使用 patch() 来模拟 function。
First important thing is that Keras Tuner must be version 1.1.2 and above.首先重要的是 Keras Tuner 必须是 1.1.2 及以上版本。
Example:例子:
from mock import patch
<your code>
# now the new function to "replace" the existing one (keras_tuner.engine.tuner_utils.SaveBestEpoch.on_epoch_end)
def new_on_epoch_end(self, epoch, logs=None):
if not self.objective.has_value(logs):
# Save on every epoch if metric value is not in the logs. Either no
# objective is specified, or objective is computed and returned
# after `fit()`.
#***** the following are the lines I added ******************************************
# Save model in Tensorflow's "SavedModel" format
save_locally = tf.saved_model.SaveOptions(experimental_io_device = '/job:localhost')
# I then added ', options = save_locally' to the line below.
#************************************************************************************
self.model.save_weights(self.filepath, options = save_locally)
return
current_value = self.objective.get_value(logs)
if self.objective.better_than(current_value, self.best_value):
self.best_value = current_value
#***** the following are the lines I added ******************************************
# Save model in Tensorflow's "SavedModel" format
save_locally = tf.saved_model.SaveOptions(experimental_io_device = '/job:localhost')
# I then added ', options = save_locally' to the line below.
#************************************************************************************
self.model.save_weights(self.filepath, options = save_locally)
with patch('keras_tuner.engine.tuner_utils.SaveBestEpoch.on_epoch_end', new_on_epoch_end):
# Perform hypertuning. The parameters are exactly like those in the fit() method.
tuner.search(
X_train,
y_train,
epochs=num_of_epochs,
validation_data = (X_valid, y_valid),
callbacks=[early_stopping]
)
<more of your code>
Since I used 'with patch', after all is done, it reverts back to the original code automatically.由于我使用了'with patch',所以完成后,它会自动恢复到原来的代码。
I hope this will be useful for those using Kaggle, or those who want to write to a local dir.我希望这对使用 Kaggle 的人或想要写入本地目录的人有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.