简体   繁体   English

在theano中定义函数的正确方法?

[英]The right way to define a function in theano?

Background: 背景:

Usually I will define a theano function with input like 'x = fmatrix()', however, during modifying keras (a deep learning library based on theano) to make it work with CTC cost, I noticed a very weird problem: if one input of the cost function is declared as 通常,我将使用诸如'x = fmatrix()'之类的输入定义一个theano函数,但是,在修改keras(基于theano的深度学习库)以使其与CTC成本兼容时,我注意到了一个非常奇怪的问题:如果有一个输入成本函数的声明为

x = tensor.zeros(shape=[M,N], dtype='float32')

instead of 代替

x = fmatrix()

the training process will converge much faster. 培训过程将更快地收敛。

A simplified problem: 一个简化的问题:

The whole codes above are quite big. 上面的整个代码很大。 So I try to simplify the problem like the following: say a function for computing Levenshtein edit distance as 因此,我尝试简化如下问题:假设一个用于计算Levenshtein编辑距离的函数为

import theano
from theano import tensor
from theano.ifelse import ifelse
def editdist(s, t):
    def update(x, previous_row, target):
        current_row = previous_row + 1
        current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], tensor.add(previous_row[:-1], tensor.neq(target,x))))
        current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], current_row[0:-1] + 1))
        return current_row
    source, target = ifelse(tensor.lt(s.shape[0], t.shape[0]), (t, s), (s, t))
    previous_row = tensor.arange(target.size + 1, dtype=theano.config.floatX)
    result, updates = theano.scan(fn = update, sequences=source, outputs_info=previous_row, non_sequences=target, name='editdist')
    return result[-1,-1]

then I define two functions f1 and f2 like: 然后我定义两个函数f1和f2:

x1 = tensor.fvector()
x2 = tensor.fvector()
r1 = editdist(x1,x2)
f1 = theano.function([x1,x2], r1)
x3 = tensor.zeros(3, dtype='float32')
x4 = tensor.zeros(3, dtype='float32')
r2 = editdist(x3,x4)
f2 = theano.function([x3,x4], r2)

When computing with f1 and f2, the results are different: 使用f1和f2计算时,结果不同:

>>f1([1,2,3],[1,3,3])
   array(1.0)

>>f2([1,2,3],[1,3,3])
   array(3.0)

f1 gives the right result, but f2 doen't. f1给出正确的结果,但f2没有给出正确的结果。

So my problem is: what is the right way to define a theano function? 所以我的问题是:定义theano函数的正确方法是什么? And, what actually went wrong about f2? 而且,f2到底出了什么问题?

Update: 更新:

I'm using theano of version 0.8.0.dev0. 我正在使用版本0.8.0.dev0的theano。 I just tried theano 0.7.0, both f1 and f2 give correct result. 我只是尝试theano 0.7.0,f1和f2都给出正确的结果。 Maybe this is a bug of theano? 也许这是theano的错误?

Update_1st 1-27-2016: 更新_1st 1-27-2016:

According to the explanation of @lamblin on this issue ( https://github.com/Theano/Theano/issues/3925#issuecomment-175088918 ), this was actually a bug of theano, and has been fixed in the latest (1-26-2016) version. 根据@lamblin对此问题的解释( https://github.com/Theano/Theano/issues/3925#issuecomment-175088918 ),这实际上是theano的错误,并且已在最新版本中修复(1- 26-2016)版本。 For convenience, lamblin's explanation is quoted here: 为了方便起见,此处引用了lamblin的解释:

The first way is the most natural one, but in theory both should be equivalent. 第一种方法是最自然的方法,但理论上两者应该是等效的。 x3 and x4 are created as the output of an "alloc" operation, the input of which would be the constant 3, rather than free inputs like x1 and x2, but that should not matter since you pass [x3, x4] as inputs to theano.function, which should cut the computation graph right there. x3和x4被创建为“分配”操作的输出,其输入将为常数3,而不是诸如x1和x2之类的自由输入,但这无关紧要,因为您将[x3,x4]作为输入传递给theano.function,应在此处剪切计算图。

My guess is that scan is optimizing prematurely, believing that x3 or x4 is guaranteed to always be the constant 0, and does some simplifications that proved incorrect when values are provided for them. 我的猜测是扫描过早优化,认为x3或x4始终保证为常数0,并做了一些简化,这些简化被证明在为它们提供值时是不正确的。 That would be an actual bug in scan." 那将是扫描中的实际错误。”

Update_2nd 1-27-2016: Update_2nd 1-27-2016:

Unfortunately the bug is not totally fixed yet. 不幸的是,该错误尚未完全修复。 In the background section I mentioned if one input of the cost function is declared as tensor.zeros() the convergence process will be much faster, I've found the reason: when input declared as tensor.zeros(), the cost function gave incorrect result, though mysteriously this helped the convergence. 在背景部分中,我提到如果将cost函数的一个输入声明为tensor.zeros(),那么收敛过程会更快,我发现了原因:当将输入声明为tensor.zeros()时,cost函数给出了错误的结果,尽管这神秘地帮助了收敛。 I managed a simplified problem reproduction demo here ( https://github.com/daweileng/TheanoDebug ), run the ctc_bench.py and you can see the results. 我在这里( https://github.com/daweileng/TheanoDebug )管理了一个简化的问题重现演示,运行ctc_bench.py​​,您可以看到结果。

theano.tensor.zeros(...) can't take any other value than 0. theano.tensor.zeros(...)不能采用除0外的任何其他值。

Unless you add nodes to the graph of course and modify parts of the zeros tensor using theano.tensor.set_subtensor . 除非您向图当然添加节点,然后使用theano.tensor.set_subtensor修改零张量的theano.tensor.set_subtensor

The input tensor theano.tensor.fmatrix can take any value you input. 输入张量theano.tensor.fmatrix可以采用您输入的任何值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM