用keras计算梯度范数权重

Question

I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). 我正在尝试计算相对于具有keras的神经网络（作为诊断工具）的权重的梯度范数。 Eventually, I want to create a callback for this, but on the way there I have been working on just creating a function that can compute the gradient and return actual values in the form of a numpy array/scalar value (and not just a tensorflow tensor). 最终，我想为此创建一个回调，但是在此过程中，我一直在努力创建一个可以计算梯度并以numpy数组/标量值（而不只是tensorflow）形式返回实际值的函数张量）。 The code is as follows: 代码如下：

import numpy as np
import keras.backend as K
from keras.layers import Dense
from keras.models import Sequential


def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    func = K.function([model.input], [norm])
    return func


def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='RMSprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x]))

if  __name__ == '__main__':
    main()

The code fails on the call to get_gradient() . 代码在调用get_gradient()失败。 The traceback is lengthy, involving a lot about shapes, but little information on what is the correct shape. 追溯很长，涉及很多形状，但是关于什么是正确形状的信息很少。 How can I correct this? 我该如何纠正？

Ideally, I would like a backend-agnostic solution, but a tensorflow-based solution is also an option. 理想情况下，我想要一个与后端无关的解决方案，但是基于tensorflow的解决方案也是一种选择。

2017-08-15 15:39:14.914388: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.914414: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915026: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.915038: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915310: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1] has negative dimensions
2017-08-15 15:39:14.915321: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 42, in main
    print(get_gradient([x]))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 2251, in __call__
    **self.session_kwargs)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'dense_2_sample_weights', defined at:
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 39, in main
    model.compile(loss='mse', optimizer='RMSprop')
  File "/home/josteb/sandbox/keras/keras/models.py", line 783, in compile
    **kwargs)
  File "/home/josteb/sandbox/keras/keras/engine/training.py", line 799, in compile
    name=name + '_sample_weights'))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 435, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1530, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1954, in _placeholder
    name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Answer 1

There are several placeholders related to the gradient computation process in Keras: Keras中有几个与梯度计算过程有关的占位符：

Input x 输入x
Target y 目标y
Sample weights: even if you don't provide it in model.fit() , Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx()) into the graph during training. 样本权重：即使您未在model.fit()提供样本权重， np.ones((y.shape[0],), dtype=K.floatx())为样本权重生成占位符，并np.ones((y.shape[0],), dtype=K.floatx())进入训练过程中的图表。
Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (eg Dropout ). 学习阶段：仅当使用任何占位符（例如Dropout ）时，此占位符才会连接到梯度张量。

So, in your provided example, in order to compute the gradients, you need to feed x , y and sample_weights into the graph. 因此，在您提供的示例中，为了计算梯度，您需要将x ， y和sample_weights输入到图中。 That's the underlying reason of the error. 这就是错误的根本原因。

Inside Model._make_train_function() there are the following lines showing how to construct the necessary inputs to K.function() in this case: 在Model._make_train_function()内部，以下几行显示了在这种情况下如何构造K.function()的必要输入：

inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
    inputs += [K.learning_phase()]

with K.name_scope('training'):
    ...
    self.train_function = K.function(inputs,
                                     [self.total_loss] + self.metrics_tensors,
                                     updates=updates,
                                     name='train_function',
                                     **self._function_kwargs)

By mimicking this function, you should be able to get the norm value: 通过模仿此功能，您应该能够获得norm值：

def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
    func = K.function(inputs, [norm])
    return func

def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='rmsprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x, y, np.ones(len(y))]))

Execution output: 执行输出：

Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073     
[4.4091368]

Note that since you're using Sequential instead of Model , model.model._feed_* is required instead of model._feed_* . 请注意，由于您使用的是Sequential而不是Model ， model.model._feed_*需要model._feed_*而不是model._feed_* 。

Answer 2

Extending josteinb's comment , I'm sharing the version that I have used. 扩展josteinb的评论，我正在分享我使用的版本。

Basically same with the previous answer , but this version integrates norm computation into the usual training routine. 基本上与先前的答案相同，但是此版本将范数计算集成到常规的训练例程中。

import keras.backend as K

# Get a "l2 norm of gradients" tensor
def get_gradient_norm(model):
    with K.name_scope('gradient_norm'):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
    return norm

# Build a model
model = Model(...)

# Compile the model
model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["categorical_accuracy"],
)

# Append the "l2 norm of gradients" tensor as a metric
model.metrics_names.append("gradient_norm")
model.metrics_tensors.append(get_gradient_norm(model))

# You can compute the norm within the usual training routine
loss, acc, gradient_norm = model.train_on_batch(batch, label)

用keras计算梯度范数权重

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-08-18 20:46:03

解决方案2
1 2018-09-21 17:34:03

用keras计算梯度范数权重

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-08-18 20:46:03

解决方案2 1 2018-09-21 17:34:03

解决方案1
4 已采纳 2017-08-18 20:46:03

解决方案2
1 2018-09-21 17:34:03