keras model.fit() 和 TF tape.gradient() 给出不同的结果

Question

I have a model that I am building using the keras functional API.我有一个 model，我正在使用 keras 功能 API 构建它。 After defining it, I compile it with the SGD optimizer as follows.定义后，我使用 SGD 优化器编译它，如下所示。

opt = tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.9, decay=1e-3,clipnorm=1)
model.compile(optimizer=opt, loss='mse')
model.fit(train_datagen, epochs=50,shuffle=True,verbose=True)

This works fine and my model converges as expected.这工作正常，我的 model 按预期收敛。

However, when I to implement the same exact functionality using TF's tape gradient, I consistently get NaN gradients which cause my weights to equate to NaN and subsequently my loss function value because NaN.但是，当我使用 TF 的磁带渐变实现相同的精确功能时，我始终得到 NaN 渐变，这导致我的权重等于 NaN，随后由于 NaN 而丢失 function 值。 Here is the code that I use:这是我使用的代码：

opt = tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.9, decay=1e-3,clipnorm=1)
loss_fn = tf.keras.losses.MeanSquaredError()

epochs = 50

for epoch in range(epochs):
        
    batch_list = list(range(len(train_datagen)))
    random.shuffle(batch_list)

    running_loss = 0
    
    for ii in batch_list:
        x,y_true = train_datagen[ii]
        
        with tf.GradientTape() as tape:
            y_pred = model(x, training=True)
            loss_value = loss_fn(y_true,y_pred)
            
        grads = tape.gradient(loss_value, model.trainable_variables)
        opt.apply_gradients(zip(grads, model.trainable_variables))
        
        running_loss += loss_value
    
    print('Epoch',epoch,'Running Loss:',running_loss.numpy()/len(batch_list))

Is the code that I wrote equivalent to the Keras model.fit() functionality?我编写的代码是否等同于 Keras model.fit() 功能？ For some reason, when I use the above code, I consistently get NaN gradients but with model.fit() it never happens.出于某种原因，当我使用上面的代码时，我一直得到NaN渐变，但使用 model.fit() 它永远不会发生。

Answer 1

I think this is because the class based loss, MeanSquaredError , requires some extra tinkering to get it working outside of the .fit method.我认为这是因为基于 class 的损失MeanSquaredError需要一些额外的修补才能使其在.fit方法之外工作。 Instead, use the functional one to make it easier.相反，使用功能性使其更容易。 Just call it inside the training step like this.只需像这样在训练步骤中调用它。

with tf.GradientTape() as tape:
    y_pred = model(x, training=True)
    loss_value = tf.keras.losses.mean_squared_error(y_true,y_pred)

keras model.fit() 和 TF tape.gradient() 给出不同的结果

问题描述

1 个解决方案

解决方案1
1 2020-08-08 06:53:13

keras model.fit() 和 TF tape.gradient() 给出不同的结果

问题描述

1 个解决方案

解决方案1 1 2020-08-08 06:53:13

解决方案1
1 2020-08-08 06:53:13