[英]How to override gradient for the nonlinearity functions in lasagne?
I have a model, for which i need to compute the gradients of output wrt the model's input.我有一个模型,我需要计算模型输入的输出梯度。 But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers.
但我想为一些模型层上应用的一些非线性函数应用一些自定义梯度。 So i tried the idea explained here , which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass.
所以我尝试了这里解释的想法,它在前向传播中计算非线性整流器 (RELU),但在后向传播中修改 Relu 的梯度。 I added the following two classes:
我添加了以下两个类:
class ModifiedBackprop(object): def __init__(self, nonlinearity): self.nonlinearity = nonlinearity self.ops = {} # memoizes an OpFromGraph instance per tensor type def __call__(self, x): # OpFromGraph is oblique to Theano optimizations, so we need to move # things to GPU ourselves if needed. if theano.sandbox.cuda.cuda_enabled: maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable else: maybe_to_gpu = lambda x: x # We move the input to GPU if needed. x = maybe_to_gpu(x) # We note the tensor type of the input variable to the nonlinearity # (mainly dimensionality and dtype); we need to create a fitting Op. tensor_type = x.type # If we did not create a suitable Op yet, this is the time to do so. if tensor_type not in self.ops: # For the graph, we create an input variable of the correct type: inp = tensor_type() # We pass it through the nonlinearity (and move to GPU if needed). outp = maybe_to_gpu(self.nonlinearity(inp)) # Then we fix the forward expression... op = theano.OpFromGraph([inp], [outp]) # ...and replace the gradient with our own (defined in a subclass). op.grad = self.grad # Finally, we memoize the new Op self.ops[tensor_type] = op # And apply the memoized Op to the input we got. return self.ops[tensor_type](x)
class GuidedBackprop(ModifiedBackprop): def grad(self, inputs, out_grads): (inp,) = inputs (grd,) = out_grads dtype = inp.dtype print('It works') return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
import lasagne as nn model_in = T.tensor3() # model_in = net['input'].input_var nn.layers.set_all_param_values(net['l_out'], model['param_values']) relu = nn.nonlinearities.rectify relu_layers = [layer for layer in nn.layers.get_all_layers(net['l_out']) if getattr(layer, 'nonlinearity', None) is relu] modded_relu = GuidedBackprop(relu) for layer in relu_layers: layer.nonlinearity = modded_relu prop = nn.layers.get_output( net['l_out'], model_in, deterministic=True) for sample in range(ini, batch_len): model_out = prop[sample, 'z'] # get prop for label 'z' gradients = theano.gradient.jacobian(model_out, wrt=model_in) # gradients = theano.grad(model_out, wrt=model_in) get_gradients = theano.function(inputs=[model_in], outputs=gradients) grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) grads = np.array(grads) grads = grads[sample]
Now when i run the code, it works without any error, and the shape of the output is also correct.现在,当我运行代码时,它可以正常工作,并且输出的形状也是正确的。 But that's because it executes the default theano.grad function and not the one supposed to override it.
但那是因为它执行默认的theano.grad函数,而不是应该覆盖它的函数。 In other words, the grad() function in the class GuidedBackprop never been invoked.
换句话说, GuidedBackprop类中的grad()函数从未被调用过。
Are you try to set it back the value of model output into model layer input, all gradients calculation您是否尝试将模型输出的值设置回模型层输入,所有梯度计算
group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)
## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out = layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.