Keras后端和Tensorflow之间梯度计算的变化

Question

Note: keras.backend() returns tensorflow. 注意： keras.backend()返回张量流。 Python 3.5 used. 使用Python 3.5。

I have encountered a bug in the computation of gradient. 我在计算梯度时遇到了一个错误。 I have replicated the bug in a simple Keras model and Tensorflow model shown below. 我已经在下面所示的简单Keras模型和Tensorflow模型中复制了该错误。

from keras.layers import Dense, Input
from keras.models import Model
from keras.optimizers import Adam
import keras

import keras.backend as K
import numpy as np
import tensorflow as tf

class KerasModel(object):
    def __init__(self, seed, dim_size, optimizer, loss_func):


        self.sess=tf.Session()

        I = Input(
                shape=[dim_size], 
                name='i'
                 )

        O = Dense(
                1,
                activation='relu',
                kernel_initializer=keras.initializers.glorot_uniform(seed=seed),
                name='o'
                 )(I)


        self.model = Model(inputs=I,outputs=O)
        self.model.compile(loss=loss_func, optimizer=optimizer)


        self.action_grads = tf.gradients(self.model.output, self.model.input)  
        self.grad_func= K.function(self.model.inputs, self.action_grads)

        self.sess.run(tf.global_variables_initializer())

    def tf_grad(self, X):
        return self.sess.run(self.action_grads, feed_dict={self.model.input: X,})[0]


    def keras_grad(self, X):        
        return self.grad_func(X)[0]



class TFModel(object):
    def __init__(self, seed, dim_size, optimizer, loss_func):

        self.graph= tf.Graph()

        with self.graph.as_default():
            glorot_uniform= tf.glorot_uniform_initializer(seed=seed)

            O= {
                    'weights': tf.Variable(glorot_uniform([dim_size, 1])),
                    'bias': tf.Variable( tf.zeros(1) )
                }


            w_list= [ 
                O['weights'], O['bias']
                    ] 

            w_list_placeholder= []
            w_list_update= []
            for i in range(0, len(w_list)):
                w_list_placeholder.append( tf.placeholder(tf.float32) )
                w_list_update.append( w_list[i].assign( w_list_placeholder[i] ) )


            I= tf.placeholder(tf.float32, shape= (None, dim_size))

            output= tf.nn.relu( tf.add( tf.matmul( I, O['weights']), O['bias'] ) )

            gradient= tf.gradients(output, I)

            y= tf.placeholder(tf.float32)

            loss= tf.reduce_mean( loss_func(y, output) )

            train = optimizer.minimize(loss)

            self.tensors= {
                'output': output, 'I': I, 'y': y, 'grad':gradient,
                'loss': loss, 'train-op': train, 'w': w_list,
                'w-placeholder': w_list_placeholder, 'w-update': w_list_update
                          }

            self.sess= tf.Session(graph=self.graph)

            self.sess.run( tf.variables_initializer( self.graph.get_collection('variables') ) )


    def train_on_batch(self, X, y):
        _, l=self.sess.run( 
            [self.tensors['train-op'], self.tensors['loss']],
            feed_dict={ 
                self.tensors['I']: X,
                self.tensors['y']: y
                       }
                           )
        return l

    def predict(self, X):
        return self.sess.run(self.tensors['output'], feed_dict={self.tensors['I']: X})

    def get_weights(self):
        return self.sess.run(self.tensors['w'])

    def set_weights(self, new_weights):
        self.sess.run( 
            self.tensors['w-update'], 
            feed_dict={ x:y for x,y in zip(self.tensors['w-placeholder'], new_weights) } 
                     )

    def grad(self, X):
        return self.sess.run(self.tensors['grad'], feed_dict={self.tensors['I']:X})[0]

The two models are identical in layers, initialization, optimizer, loss function, weights, etc. 这两个模型的层，初始化，优化器，损失函数，权重等相同。

I use the two models to compute gradient from the same set of inputs. 我使用这两种模型从同一组输入中计算梯度。 For the Tensorflow model, this is done with the grad() function: 对于Tensorflow模型，这是通过grad()函数完成的：

def grad(self, X):
    return self.sess.run(self.tensors['grad'], feed_dict={self.tensors['I']:X})[0]

For the Keras model, this is done with keras_grad() and tf_grad() . 对于keras_grad()模型，这是通过keras_grad()和tf_grad() 。

self.action_grads = tf.gradients(self.model.output, self.model.input)
self.grad_func= K.function(self.model.inputs, self.action_grads)

def tf_grad(self, X):
    return self.sess.run(self.action_grads, feed_dict={self.model.input: X,})[0]


def keras_grad(self, X):        
    return self.grad_func(X)[0]

keras_grad() utilizes keras.backend.function() to do so, while tf_grad() utilizes tf.Session() . keras_grad()利用keras.backend.function()来实现，而tf_grad()利用tf.Session() 。

They are then trained with a different set of identical inputs. 然后使用一组不同的相同输入对他们进行训练。 Again, this is shown below: 再次显示如下：

seed=1
dim_size=3
learning_rate=0.01
e=1e-8


k_adam= KerasModel(
    seed, dim_size, tf.train.AdamOptimizer(learning_rate=learning_rate,epsilon= e),
    keras.losses.mean_squared_error
                  )

tf_model= TFModel(
    seed, dim_size, tf.train.AdamOptimizer(learning_rate=learning_rate,epsilon= e),
    keras.losses.mean_squared_error
                  )


X=np.array([[0.25175066, 0.53507285, 0.3210762 ]])
#X= np.random.random([1,dim_size])
y= np.random.random([1])

print(k_adam.keras_grad([X]))
print(k_adam.tf_grad(X))
print(tf_model.grad(X))

print()
X= np.array([[0.47194079, 0.85071664, 0.25451934]])
#X= np.random.random([1,dim_size])
y= np.random.random([1])
k_adam.model.train_on_batch(X,y)
tf_model.train_on_batch(X,y)

print(k_adam.keras_grad([X]))
print(k_adam.tf_grad(X))
print(tf_model.grad(X))

print()
X= np.random.random([1,dim_size])
y= np.random.random([1])
k_adam.model.train_on_batch(X,y)
tf_model.train_on_batch(X,y)

print(k_adam.keras_grad([X]))
print(k_adam.tf_grad(X))
print(tf_model.grad(X))

Running the code should give you the following output: 运行代码应为您提供以下输出：

[[-0.63922524  1.0297645  -1.1010152 ]]
[[-0.63922524  1.0297645  -1.1010152 ]]
[[-0.63922524  1.0297645  -1.1010152 ]]

[[-0.62922525  1.0397645  -1.0910152 ]]
[[-0.63922524  1.0297645  -1.1010152 ]]
[[-0.62922525  1.0397645  -1.0910152 ]]

[[ ... ]]
....

And some other results. 和其他一些结果。

For the first block of output, I expected the array of gradients computed to be the same among the two models, since they are the same model. 对于输出的第一个块，我期望两个模型之间计算出的梯度数组相同，因为它们是同一模型。 And this is true. 这是真的。

For the second block of output, I have trained the two models with a different set of identical inputs. 对于第二个输出块，我用一组不同的相同输入训练了这两个模型。 So the array of gradients should differ from the first block of output, but they should be the same among themselves (the second block of output). 因此，梯度数组应与输出的第一个块不同，但它们之间（输出的第二个块）应相同。 This is true to a degree. 这在一定程度上是正确的。 The array of gradients from the Tensorflow model and keras_grad() is the same between themselves but different from the first block. Tensorflow模型和keras_grad()的梯度数组在它们之间是相同的，但与第一个块不同。

However, the output of tf_grad() did not change from the first block of output. 但是， tf_grad()的输出与输出的第一块相比没有变化。

From my own testing, I found that the output from tf_grad() varies between either 0 or the same set of numbers after initialization. 通过我自己的测试，发现初始化后tf_grad()的输出在0或同一组数字之间变化。 This behavior repeats even when training with different batches. 即使使用不同的批次进行训练，也会重复这种行为。

As mentioned already, the only difference in tf_grad() and keras_grad() , which are both from the Keras model, is that one ( tf_grad ) is ran with tf.session() while the other ( keras_grad ) is ran with keras.backend.function() . 如前所述， tf_grad()和keras_grad()的唯一区别均来自tf_grad模型，其中一个（ tf_grad ）与tf.session()一起运行，而另一个（ keras_grad ）与keras.backend.function()一起keras.backend.function() 。

Why is it that one is able to update accordingly and the other cannot do so? 为什么一个人能够相应地更新而另一个人却不能这样做？

Answer 1

You need to set the session to the keras TF backend 您需要将会话设置为keras TF后端

K.set_session(self.sess)

Like this: 像这样：

class KerasModel(object):
def __init__(self, seed, dim_size, optimizer, loss_func):
    self.sess=tf.Session()

    K.set_session(self.sess)

Keras后端和Tensorflow之间梯度计算的变化

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-10-12 16:37:20

Keras后端和Tensorflow之间梯度计算的变化

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-10-12 16:37:20

解决方案1
0 已采纳 2018-10-12 16:37:20