简体繁体 English

每次下降之间的困惑度计算都会上升

[英]Perplexity calculations rise between each significantly drop

原文 2017-06-03 12:03:43 7 1 machine-learning/ tensorflow/ training-data/ perplexity

I am training a conversational agent using LSTM and tensorflow's translation model. 我正在使用LSTM和tensorflow的翻译模型训练会话代理。 I use batchwise training, resulting in a significant drop in the training data perplexity after each epoch start. 我使用分批训练，因此在每个纪元开始之后，训练数据的混乱程度明显下降。 This drop can be explained by the way I read data into batches, as I guarantee that every training pair in my training data is processed exactly once every epoch. 可以用我批量读取数据的方式来解释这一下降，因为我保证训练数据中的每个训练对在每个时期都被完全处理一次。 When a new epoch starts, the improvements done by the model in the previous epochs will show its profit as it encounters the training data once more, represented as a drop in the graph. 当一个新纪元开始时，模型在先前纪元中所做的改进将在再次遇到训练数据时显示其利润，表示为图中的下降。 Other batchwise approaches such as the one used in tensorflow's translation model, will not lead to the same behavior, as their methodology is to load the entire training data into memory, and randomly pick samples from it. 其他批量方法（例如tensorflow转换模型中使用的方法）将不会导致相同的行为，因为它们的方法是将整个训练数据加载到内存中并从中随机选择样本。

Step, Perplexity 步骤，困惑

330000, 19.36 330000，19.36
340000, 19.20 340000，19.20
350000, 17.79 350000，17.79
360000, 17.79 360000，17.79
370000, 17.93 370000，17.93
380000, 17.98 380000，17.98
390000, 18.05 390000，18.05
400000, 18.10 400000，18.10
410000, 18.14 410000，18.14
420000, 18.07 420000，18.07
430000, 16.48 430000，16.48
440000, 16.75 440000，16.75

(A small snipped from the perplexity showing a drop at 350000 and 430000. Between the drops, the perplexity is slightly rising) （从困惑中切出的一小部分显示在350000和430000处出现了下降。在下降之间，困惑度略有上升）

However, my question is regarding the trend after the drop. 但是，我的问题是关于下降之后的趋势。 From the graph, it is clearly that the perplexity is slightly rising (for every epoch after step ~350000), until the next drop. 从图中可以明显看出，困惑度略有上升（在步骤〜350000之后的每个时期），直到下一个下降为止。 Can someone give an answer or theory for why this is happening? 有人可以给出答案或理论来解释为什么会这样吗？