使用采样解码器输出实现seq2seq

Question

I'm trying to implement seq2seq with a feature to generate sampled outputs from the decoder ie at every step, rather than taking the argmax of the output logits from previous state, it should sample from them according to the logit distribution and use that as input for the next step. 我正在尝试实现具有从解码器生成采样输出的功能的seq2seq，即在每一步，而不是从先前状态获取输出logits的argmax，它应根据logit分布从它们中采样并将其用作输入为下一步。

After poking around I found the loop_function in seq2seq.py as a promising place to start. 在四处寻找之后，我发现seq2seq.py中的loop_function是一个很有希望的起点。 It looks like I have to write a loop function that looks like this (modified from the one in the file that extracts argmax+embedding): 看起来我必须编写一个看起来像这样的循环函数（从提取argmax +嵌入的文件中修改）：

def _extract_sample_and_embed(embedding, output_projection=None,
                          update_embedding=True):
    def loop_function(prev, _):
        if output_projection is not None:
            prev = nn_ops.xw_plus_b(prev, output_projection[0], output_projection[1])
        prev_symbol = math_ops.sample(prev)  #<------- Need this op but it does not exist ?
        emb_prev = embedding_ops.embedding_lookup(embedding, prev_symbol)
        if not update_embedding:
           emb_prev = array_ops.stop_gradient(emb_prev)
        return emb_prev
    return loop_function

Then i use this loop function generator in the seq2seq_embedding_with_attention model. 然后我在seq2seq_embedding_with_attention模型中使用这个循环函数生成器。 However, the op I need that samples from a tensor of floats does not exist in tensorflow, so do I need to write my own? 但是，操作我需要来自张量浮点数的样本在tensorflow中不存在，所以我需要编写自己的吗？ How do I do that? 我怎么做？

In searching for guidance, I found that in tensorflow/tensorflow/python/ops/candidate_sampling_ops there is a reference to : 在搜索指导时，我发现在tensorflow / tensorflow / python / ops / candidate_sampling_ops中有一个引用：
```
  from tensorflow.python.ops import gen_candidate_sampling_ops 
```
but I can't find this file. 但我找不到这个文件。 I'm guessing it's auto-generated from somewhere. 我猜它是从某个地方自动生成的。 where? 哪里？

Answer 1

Currently, you can also do it as follows, with the gumbel max trick for directe distributions : 目前，您也可以按照以下方式执行此操作，使用针对directe发行版的gumbel max技巧：

def batch_gumbel_max_sample(a, max_gumbel_noise = 1.0):
    matrix_U = -1.0*tf.log(-1.0*tf.log(tf.random_uniform(tf.shape(a),
                            minval = 0.0, maxval = max_gumbel_noise)))
    return tf.argmax(tf.sub(a, matrix_U), dimension = 1)

There is also a discussion on Tensorflows issue tracker about this at the moment. 此刻还有关于Tensorflows问题跟踪器的讨论。 I guess sooner or later a multinomial sample function will be added to Tensorflow. 我想迟早会将多项式样本函数添加到Tensorflow中。 LeavesBreathe also posted a work around on that Github page, which isn't entirely correct in my opinion: LeavesBreathe还在Github页面上发布了一个解决方案，这在我看来并不完全正确：

def batch_sample_with_temperature(a, temperature=1.0):
'''this function is like sample_with_temperature except it can handle batch input a of [batch_size x logits] 
    this function takes logits input, and produces a specific number from the array. This is all done on the gpu
    because this function uses tensorflow
    As you increase the temperature, you will get more diversified output but with more errors (usually gramatical if you're 
        doing text)
args: 
    Logits -- this must be a 2d array [batch_size x logits]
    Temperature -- how much variance you want in output
returns:
    Selected number from distribution
'''

'''
Equation can be found here: https://en.wikipedia.org/wiki/Softmax_function (under reinforcement learning)
    Karpathy did it here as well: https://github.com/karpathy/char-rnn/blob/4297a9bf69726823d944ad971555e91204f12ca8/sample.lua'''
'''a is [batch_size x logits]'''
with tf.op_scope([a,temperature], "batch_sample_with_temperature"):

    exponent_raised = tf.exp(tf.div(a, temperature)) #start by reduction of temperature, and get rid of negative numbers with exponent
    matrix_X = tf.div(exponent_raised, tf.reduce_sum(exponent_raised, reduction_indices = 1)) #this will yield probabilities!
    matrix_U = tf.random_uniform(tf.shape(a), minval = 0, maxval = 1)
    final_number = tf.argmax(tf.sub(matrix_X, matrix_U), dimension = 1) #you want dimension = 1 because you are argmaxing across rows.

return final_number

Answer 2

I meet the same problem today, and my solution is: 我今天遇到了同样的问题，我的解决方案是：
Replace line prev_symbol = math_ops.sample(prev) with prev_symbol = squeeze(multinomial(prev, 1), axis=1) 用prev_symbol = math_ops.sample(prev)替换行prev_symbol = math_ops.sample(prev) prev_symbol = squeeze(multinomial(prev, 1), axis=1)

The function tf.multinomial() draws samples from a multinomial distribution. 函数tf.multinomial（）从多项分布中提取样本。 It took a 2-D Tensor "logits" with shape [batch_size, num_classes], and a 0-D scalar "num_samples" as input. 它采用了形状[batch_size，num_classes]和0-D标量“num_samples”作为输入的2-D Tensor“logits”。 And output a drawn samples of shape [batch_size, num_samples]. 并输出绘制的形状样本[batch_size，num_samples]。

Meanwhile, math_ops.sample() outputs samples of shape [batch_size], therefore we need tf.squeeze() to reduce dimension. 同时，math_ops.sample（）输出shape [batch_size]的样本，因此我们需要tf.squeeze（）来减少维度。

This implementation is simpler. 这种实现更简单。

使用采样解码器输出实现seq2seq

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-03-27 09:56:12

解决方案2
1 2017-05-07 07:49:47

使用采样解码器输出实现seq2seq

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-03-27 09:56:12

解决方案2 1 2017-05-07 07:49:47

解决方案1
4 已采纳 2016-03-27 09:56:12

解决方案2
1 2017-05-07 07:49:47