![](/img/trans.png)
[英]Custom GAN training loop using tf.GradientTape returns [None] as gradients for generator while it works for discriminator
[英]Why does training using tf.GradientTape in tensorflow 2 have different behavior to training using fit API?
我是使用 tensorflow 2 的新手
我熟悉在tensorflow 1中使用keras
。我通常使用fit
方法API來訓練模型。 但最近在 tensorflow 2 中,他們引入了eager execution 。 所以我在fit
和tf.GradientTape
上的 CiFAR-10 數據集上實現並比較了一個簡單的圖像分類器,並tf.GradientTape
訓練了 20 個時期
經過多次運行,結果如下
fit
API 訓練的模型
tf.GradientTape
訓練的模型
我不確定為什么模型表現出不同的行為。 我想我可能會實施錯誤。 我認為在tf.GradientTape
中模型開始更快地過度擬合訓練數據集很奇怪
這是一些片段
fit
APImodel = SimpleClassifier(10)
model.compile(
optimizer=Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.CategoricalAccuracy()]
)
model.fit(X[:split_idx, :, :, :], y[:split_idx, :], batch_size=256, epochs=20, validation_data=(X[split_idx:, :, :, :], y[split_idx:, :]))
tf.GradientTape
with tf.GradientTape() as tape:
y_pred = model(tf.stop_gradient(train_X))
loss = loss_fn(train_y, y_pred)
gradients = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
完整代碼可以在 Colab 中找到
參考
tf.GradientTape
代碼中可能修復的內容很少:
1) trainable_variables
不是trainable_weights
。 您想對所有可訓練變量應用梯度,而不僅僅是模型權重
# gradients = tape.gradient(loss, model.trainable_weights)
gradients = tape.gradient(loss, model.trainable_variables)
# and
# model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
2) 從輸入張量中移除tf.stop_gradient
。
with tf.GradientTape() as tape:
# y_pred = model(tf.stop_gradient(train_X))
y_pred = model(train_X, training=True)
請注意,我還添加了訓練參數。 它還應該包含在模型定義中,以包含依賴於phase
的層(如 BatchNormalization 和 Dropout):
def call(self, X, training=None):
X = self.cnn_1(X)
X = self.bn_1(X, training=training)
X = self.cnn_2(X)
X = self.max_pool_2d(X)
X = self.dropout_1(X)
X = self.cnn_3(X)
X = self.bn_2(X, training=training)
X = self.cnn_4(X)
X = self.bn_3(X, training=training)
X = self.cnn_5(X)
X = self.max_pool_2d(X)
X = self.dropout_2(X)
X = self.flatten(X)
X = self.dense_1(X)
X = self.dropout_3(X, training=training)
X = self.dense_2(X)
return self.out(X)
通過這幾個更改,我設法獲得了稍微好一點的分數,這與keras.fit
結果更具有可比性:
[19/20] loss: 0.64020, acc: 0.76965, val_loss: 0.71291, val_acc: 0.75318: 100%|██████████| 137/137 [00:12<00:00, 11.25it/s]
[20/20] loss: 0.62999, acc: 0.77649, val_loss: 0.77925, val_acc: 0.73219: 100%|██████████| 137/137 [00:12<00:00, 11.30it/s]
答案:不同之處可能在於Keras.fit
在Keras.fit
做了大部分這些事情。
最后,為了清晰和可重復性,我使用的部分訓練/評估代碼:
for bIdx, (train_X, train_y) in enumerate(train_batch):
if bIdx < epoch_max_iter:
with tf.GradientTape() as tape:
y_pred = model(train_X, training=True)
loss = loss_fn(train_y, y_pred)
total_loss += (np.sum(loss.numpy()) * train_X.shape[0])
total_num += train_X.shape[0]
# gradients = tape.gradient(loss, model.trainable_weights)
gradients = tape.gradient(loss, model.trainable_variables)
total_acc += (metrics(train_y, y_pred) * train_X.shape[0])
running_loss = (total_loss/total_num)
running_acc = (total_acc/total_num)
# model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
pbar.set_description("[{}/{}] loss: {:.5f}, acc: {:.5f}".format(e, epochs, running_loss, running_acc))
pbar.refresh()
pbar.update()
和評估一:
# Eval loop
# Calculate something wrong here
val_total_loss = 0
val_total_acc = 0
total_val_num = 0
for bIdx, (val_X, val_y) in enumerate(val_batch):
if bIdx >= max_val_iterations:
break
y_pred = model(val_X, training=False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.