簡體   English   中英

如何覆蓋千層面中非線性函數的梯度?

[英]How to override gradient for the nonlinearity functions in lasagne?

我有一個模型,我需要計算模型輸入的輸出梯度。 但我想為一些模型層上應用的一些非線性函數應用一些自定義梯度。 所以我嘗試了這里解釋的想法,它在前向傳播中計算非線性整流器 (RELU),但在后向傳播中修改 Relu 的梯度。 我添加了以下兩個類:

  • 允許我們用具有相同輸出但自定義梯度的 Op 替換非線性的輔助類
class ModifiedBackprop(object): def __init__(self, nonlinearity): self.nonlinearity = nonlinearity self.ops = {} # memoizes an OpFromGraph instance per tensor type def __call__(self, x): # OpFromGraph is oblique to Theano optimizations, so we need to move # things to GPU ourselves if needed. if theano.sandbox.cuda.cuda_enabled: maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable else: maybe_to_gpu = lambda x: x # We move the input to GPU if needed. x = maybe_to_gpu(x) # We note the tensor type of the input variable to the nonlinearity # (mainly dimensionality and dtype); we need to create a fitting Op. tensor_type = x.type # If we did not create a suitable Op yet, this is the time to do so. if tensor_type not in self.ops: # For the graph, we create an input variable of the correct type: inp = tensor_type() # We pass it through the nonlinearity (and move to GPU if needed). outp = maybe_to_gpu(self.nonlinearity(inp)) # Then we fix the forward expression... op = theano.OpFromGraph([inp], [outp]) # ...and replace the gradient with our own (defined in a subclass). op.grad = self.grad # Finally, we memoize the new Op self.ops[tensor_type] = op # And apply the memoized Op to the input we got. return self.ops[tensor_type](x)
  • 通過非線性引導反向傳播的子類:
 class GuidedBackprop(ModifiedBackprop): def grad(self, inputs, out_grads): (inp,) = inputs (grd,) = out_grads dtype = inp.dtype print('It works') return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
  • 然后我在我的代碼中使用它們如下:
 import lasagne as nn model_in = T.tensor3() # model_in = net['input'].input_var nn.layers.set_all_param_values(net['l_out'], model['param_values']) relu = nn.nonlinearities.rectify relu_layers = [layer for layer in nn.layers.get_all_layers(net['l_out']) if getattr(layer, 'nonlinearity', None) is relu] modded_relu = GuidedBackprop(relu) for layer in relu_layers: layer.nonlinearity = modded_relu prop = nn.layers.get_output( net['l_out'], model_in, deterministic=True) for sample in range(ini, batch_len): model_out = prop[sample, 'z'] # get prop for label 'z' gradients = theano.gradient.jacobian(model_out, wrt=model_in) # gradients = theano.grad(model_out, wrt=model_in) get_gradients = theano.function(inputs=[model_in], outputs=gradients) grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) grads = np.array(grads) grads = grads[sample]

現在,當我運行代碼時,它可以正常工作,並且輸出的形狀也是正確的。 但那是因為它執行默認的theano.grad函數,而不是應該覆蓋它的函數。 換句話說, GuidedBackprop類中的grad()函數從未被調用過。

  1. 我不明白這是什么問題?
  2. 有解決辦法嗎?
  3. 如果這是一個未解決的問題,是否有一個 Theano Op 的實現可以實現這樣的功能或其他方式來覆蓋應用於模型某些層的特定非線性函數的梯度?

您是否嘗試將模型輸出的值設置回模型層輸入,所有梯度計算

group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)

## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out =  layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

圖片中的梯度值

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM