来自keras的Model.train_on_batch和来自tensorflow的Session.run（[train_optimizer]）有什么区别？

Question

In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? 在以下Keras和Tensorflow实现的神经网络训练中，keras实现中的model.train_on_batch([x], [y])与sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)在Tensorflow实现中？ In particular: how those two lines can lead to different computation in training?: 特别是：这两条线在训练中如何导致不同的计算？：

keras_version.py keras_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)

model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)

nb_batchs = int(len(x_train)/batch_size)

for epoch in range(epochs):
    loss = 0.0
    for batch in range(nb_batchs):
        x = x_train[batch*batch_size:(batch+1)*batch_size]
        y = y_train[batch*batch_size:(batch+1)*batch_size]

        loss_batch, acc_batch = model.train_on_batch([x], [y])

        loss += loss_batch
    print(epoch, loss / nb_batchs)

tensorflow_version.py tensorflow_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)

input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
    name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

nb_batchs = int(len(x_train)/batch_size)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epochs):
        loss = 0.0
        acc = 0.0

        for batch in range(nb_batchs):
            x = x_train[batch*batch_size:(batch+1)*batch_size]
            y = y_train[batch*batch_size:(batch+1)*batch_size]

            feed_dict = {input_x: x,
                         input_y: y}
            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

            loss += loss_batch
        print(epoch, loss / nb_batchs)

Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation. 注意：这个问题遵循相同（？）模型收敛于Keras但不在Tensorflow中，这被认为过于宽泛但我在其中明确说明为什么我认为这两个语句在某种程度上不同并导致不同的计算。

Answer 1

Yes, the results can be different. 是的，结果可能不同。 The results shouldn't be surprising if you know the following things in advance: 如果您事先知道以下事项，结果应该不会令人惊讶：

Implementation of corss-entropy in Tensorflow and Keras is different. 在Tensorflow和Keras中实现corss-entropy是不同的。 Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities Tensorflow假定输入到tf.nn.softmax_cross_entropy_with_logits_v2作为原料非标准化logits而Keras接受输入作为概率
Implementation of optimizers in Keras and Tensorflow are different. Keras和Tensorflow中optimizers实现是不同的。
It might be the case that you are shuffling the data and the batches passed aren't in the same order. 可能是您正在洗牌数据并且传递的批次的顺序不同。 Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. 虽然如果长时间运行模型并不重要，但最初的几个时期可能完全不同。 Make sure same batch is passed to both and then compare the results. 确保将同一批次传递给两者，然后比较结果。

来自keras的Model.train_on_batch和来自tensorflow的Session.run（[train_optimizer]）有什么区别？

问题描述

1 个解决方案

解决方案1
6 2018-11-24 11:17:11

来自keras的Model.train_on_batch和来自tensorflow的Session.run（[train_optimizer]）有什么区别？

问题描述

1 个解决方案

解决方案1 6 2018-11-24 11:17:11

解决方案1
6 2018-11-24 11:17:11