简体   繁体   English

Tensorflow RNN PTB教程测试方法和状态重置不是错误的吗?

[英]Isn't Tensorflow RNN PTB tutorial test measure and state reset wrong?

I have two question on Tensorflow PTB RNN tutorial code ptb_word_lm.py . 我对Tensorflow PTB RNN教程代码ptb_word_lm.py有两个问题。 Code blocks below are from the code. 以下代码块来自代码。

  1. Is it okay to reset state for every batch? 每个批次都可以重置状态吗?

     self._initial_state = cell.zero_state(batch_size, data_type()) with tf.device("/cpu:0"): embedding = tf.get_variable( "embedding", [vocab_size, size], dtype=data_type()) inputs = tf.nn.embedding_lookup(embedding, input_.input_data) if is_training and config.keep_prob < 1: inputs = tf.nn.dropout(inputs, config.keep_prob) outputs = [] state = self._initial_state with tf.variable_scope("RNN"): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) 

    In line 133, we set the initial state as zero. 在第133行中,我们将初始状态设置为零。 Then, line 153, we use the zero state as the starting state of the rnn steps. 然后,在第153行,我们将零状态用作rnn步骤的起始状态。 It means that every starting state of batch is set to zero. 这意味着批次的每个开始状态都设置为零。 I believe that if we want to apply BPTT(backpropagation through time), we should make external(non-zero) state input of step where previous data is finished, like stateful RNN (in Keras). 我相信,如果要应用BPTT(通过时间反向传播),则应该对完成先前数据的步骤进行外部(非零)状态输入,例如有状态RNN(在Keras中)。

    I found that resetting starting state to zero practically works. 我发现将起始状态重置为零实际上是可行的。 But is there any theoretical background (or paper) of why this works? 但是,为什么有这种工作原理有任何理论背景(或论文)?

  2. Is it okay to measure test perplexity like this? 这样测量测试的困惑度可以吗?

     eval_config = get_config() eval_config.batch_size = 1 eval_config.num_steps = 1 

    Related to the previous question... The model fixes the initial state to zero for every batch. 与上一个问题相关...该模型将每个批次的初始状态固定为零。 However, in line 337 ~ 338, we make batch size 1 and num steps 1 for test configuration. 但是,在337〜338行中,我们将批次大小设为1,将num step 1用于测试配置。 Then, for the test data, we will put single data each time and predict next one without context(!) because the state will be zero for every batch (with only one timestep). 然后,对于测试数据,我们将每次都放入单个数据,并在不使用context(!)的情况下预测下一个数据,因为每批的状态为零(只有一个时间步长)。

    Is this correct measure for the test data? 这是对测试数据的正确度量吗? Does every other language model papers measure test perplexity as predicting next word without context? 是否所有其他语言模型论文都在预测没有上下文的下一个单词时测量测试的困惑?

I ran this code and got a similar result as the code says and also the original paper says. 我运行了这段代码,并得到了与代码所说的以及原始论文说的相似的结果。 If this code is wrong, which I hope not, do you have any idea how to replica the paper result? 如果此代码是错误的(希望不是),您是否知道如何复制纸张结果? Maybe I can make a pull request if I modify the problems. 如果我修改了问题,也许可以提出请求请求。

Re (1), the code does (cell_output, state) = cell(inputs[:, time_step, :], state) . 关于(1),代码执行(cell_output, state) = cell(inputs[:, time_step, :], state) This assigns the state for the next time step to be the output state of this time step. 这会将下一个时间步的状态分配为该时间步的输出状态。

When you start a new batch you should do so independently from the computation you've done so far (note the distinction between batch, which are completely different examples, and time steps in the same sequence). 当开始一个新的批处理时,您应该独立于到目前为止所做的计算(请注意批处理之间的区别,批处理是完全不同的示例,并且时间步序列相同)。

Re (2), most of the time context is used. 关于Re(2),大部分时间使用上下文。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM