简体   繁体   中英

Perplexity calculations rise between each significantly drop

I am training a conversational agent using LSTM and tensorflow's translation model. I use batchwise training, resulting in a significant drop in the training data perplexity after each epoch start. This drop can be explained by the way I read data into batches, as I guarantee that every training pair in my training data is processed exactly once every epoch. When a new epoch starts, the improvements done by the model in the previous epochs will show its profit as it encounters the training data once more, represented as a drop in the graph. Other batchwise approaches such as the one used in tensorflow's translation model, will not lead to the same behavior, as their methodology is to load the entire training data into memory, and randomly pick samples from it.

在此处输入图片说明

Step, Perplexity

  • 330000, 19.36
  • 340000, 19.20
  • 350000, 17.79
  • 360000, 17.79
  • 370000, 17.93
  • 380000, 17.98
  • 390000, 18.05
  • 400000, 18.10
  • 410000, 18.14
  • 420000, 18.07
  • 430000, 16.48
  • 440000, 16.75

(A small snipped from the perplexity showing a drop at 350000 and 430000. Between the drops, the perplexity is slightly rising)

However, my question is regarding the trend after the drop. From the graph, it is clearly that the perplexity is slightly rising (for every epoch after step ~350000), until the next drop. Can someone give an answer or theory for why this is happening?

这将是过度拟合的典型情况。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM