简体   繁体   English

相对于Theano中的subtensor定义渐变

[英]Defining a gradient with respect to a subtensor in Theano

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials). 我在概念上有一个关于Theano的简单问题,但是我还没有找到答案(我会先前承认并没有真正理解共享变量在Theano中是如何工作的,尽管有很多小时的教程)。

I'm trying to implement a "deconvolutional network"; 我正在尝试实施“反卷积网络”; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; 具体来说,我有一个3张输入(每个输入是一个2D图像)和一个4张量的代码; for the ith input codes[i] represents a set of codewords which together code for input i. 对于第i个输入代码[i]表示一组代码字,它们一起编码输入i。

I've been having a lot of trouble figuring out how to do gradient descent on the codewords. 我一直在努力弄清楚如何在代码字上做渐变下降。 Here are the relevant parts of my code: 以下是我的代码的相关部分:

idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2) 

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
    [codes, T.set_subtensor(codes[input_index], codes[input_index] - 
                            learning_rate*del_codes[input_index])     ]])

(here codes and dicts are shared tensor variables). (这里代码和dicts是共享的张量变量)。 Theano is unhappy with this, specifically with defining Theano对此不满意,尤其是定义

del_codes = T.grad(loss, codes[idx])

The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0 我得到的错误信息是: theano.gradient.DisconnectedInputError:要求grad方法计算相对于不是成本计算图的一部分的变量的梯度,或者仅由不可微运算符使用:Subtensor {int64} .0

I'm guessing that it wants a symbolic variable instead of codes[idx]; 我猜它想要一个符号变量而不是代码[idx]; but then I'm not sure how to get everything connected to get the intended effect. 但后来我不知道如何让一切都连接起来以获得预期的效果。 I'm guessing I'll need to change the final line to something like 我猜我需要将最后一行更改为类似的内容

learning_rate*del_codes)     ]])

Can someone give me some pointers as to how to define this function properly? 有人可以给我一些关于如何正确定义这个功能的指示吗? I think I'm probably missing something basic about working with Theano but I'm not sure what. 我想我可能错过了与Theano合作的基本知识,但我不确定是什么。

Thanks in advance! 提前致谢!

-Justin -Justin

Update: Kyle's suggestion worked very nicely. 更新:凯尔的建议非常好。 Here's the specific code I used 这是我使用的具体代码

current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)  

del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])

for i in xrange(num_inputs):
     current_loss = train_codes(i)
     codes_update_fn(i)

To summarize the findings: 总结一下这些发现:

Assigning grad_var = codes[idx] , then making a new variable such as: subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index]) 分配grad_var = codes[idx] ,然后创建一个新变量,例如: subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])

Then calling train_codes = function([input_index], loss, updates = [[codes, subgrad]]) 然后调用train_codes = function([input_index], loss, updates = [[codes, subgrad]])

seemed to do the trick. 似乎做了伎俩。 In general, I try to make variables for as many things as possible. 一般来说,我尝试为尽可能多的事情制作变量。 Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! 有时在单个语句中尝试做太多会产生棘手的问题,而且以后很难调试和理解! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it. 此外,在这种情况下,我认为theano需要一个共享变量,但如果在需要它的函数内创建共享变量,则会出现问题。

Glad this worked for you! 很高兴这对你有用!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM