如何覆盖千层面中非线性函数的梯度？

Question

I have a model, for which i need to compute the gradients of output wrt the model's input.我有一个模型，我需要计算模型输入的输出梯度。 But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers.但我想为一些模型层上应用的一些非线性函数应用一些自定义梯度。 So i tried the idea explained here , which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass.所以我尝试了这里解释的想法，它在前向传播中计算非线性整流器 (RELU)，但在后向传播中修改 Relu 的梯度。 I added the following two classes:我添加了以下两个类：

The helper class that allows us to replace a nonlinearity with an Op that has the same output, but a custom gradient允许我们用具有相同输出但自定义梯度的 Op 替换非线性的辅助类

class ModifiedBackprop(object): def __init__(self, nonlinearity): self.nonlinearity = nonlinearity self.ops = {} # memoizes an OpFromGraph instance per tensor type def __call__(self, x): # OpFromGraph is oblique to Theano optimizations, so we need to move # things to GPU ourselves if needed. if theano.sandbox.cuda.cuda_enabled: maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable else: maybe_to_gpu = lambda x: x # We move the input to GPU if needed. x = maybe_to_gpu(x) # We note the tensor type of the input variable to the nonlinearity # (mainly dimensionality and dtype); we need to create a fitting Op. tensor_type = x.type # If we did not create a suitable Op yet, this is the time to do so. if tensor_type not in self.ops: # For the graph, we create an input variable of the correct type: inp = tensor_type() # We pass it through the nonlinearity (and move to GPU if needed). outp = maybe_to_gpu(self.nonlinearity(inp)) # Then we fix the forward expression... op = theano.OpFromGraph([inp], [outp]) # ...and replace the gradient with our own (defined in a subclass). op.grad = self.grad # Finally, we memoize the new Op self.ops[tensor_type] = op # And apply the memoized Op to the input we got. return self.ops[tensor_type](x)

The subclass that does guided backpropagation through a nonlinearity:通过非线性引导反向传播的子类：

 class GuidedBackprop(ModifiedBackprop): def grad(self, inputs, out_grads): (inp,) = inputs (grd,) = out_grads dtype = inp.dtype print('It works') return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)

Then i used them in my code as follows:然后我在我的代码中使用它们如下：

 import lasagne as nn model_in = T.tensor3() # model_in = net['input'].input_var nn.layers.set_all_param_values(net['l_out'], model['param_values']) relu = nn.nonlinearities.rectify relu_layers = [layer for layer in nn.layers.get_all_layers(net['l_out']) if getattr(layer, 'nonlinearity', None) is relu] modded_relu = GuidedBackprop(relu) for layer in relu_layers: layer.nonlinearity = modded_relu prop = nn.layers.get_output( net['l_out'], model_in, deterministic=True) for sample in range(ini, batch_len): model_out = prop[sample, 'z'] # get prop for label 'z' gradients = theano.gradient.jacobian(model_out, wrt=model_in) # gradients = theano.grad(model_out, wrt=model_in) get_gradients = theano.function(inputs=[model_in], outputs=gradients) grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) grads = np.array(grads) grads = grads[sample]

Now when i run the code, it works without any error, and the shape of the output is also correct.现在，当我运行代码时，它可以正常工作，并且输出的形状也是正确的。 But that's because it executes the default theano.grad function and not the one supposed to override it.但那是因为它执行默认的theano.grad函数，而不是应该覆盖它的函数。 In other words, the grad() function in the class GuidedBackprop never been invoked.换句话说， GuidedBackprop类中的grad()函数从未被调用过。

I can't understand what is the issue?我不明白这是什么问题？
is there's a solution?有解决办法吗？
If this is an unresolved issue, is there's an implementation for a Theano Op that can achieve such a functionality or some other way to override gradient for specific nonlinearity functions applied on some of the model's layers?如果这是一个未解决的问题，是否有一个 Theano Op 的实现可以实现这样的功能或其他方式来覆盖应用于模型某些层的特定非线性函数的梯度？

Answer 1

Are you try to set it back the value of model output into model layer input, all gradients calculation您是否尝试将模型输出的值设置回模型层输入，所有梯度计算

group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)

## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out =  layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

如何覆盖千层面中非线性函数的梯度？

问题描述

1 个解决方案

解决方案1
0 2022-03-02 01:23:54

如何覆盖千层面中非线性函数的梯度？

问题描述

1 个解决方案

解决方案1 0 2022-03-02 01:23:54

解决方案1
0 2022-03-02 01:23:54