简体   繁体   中英

Output Projection in Seq2Seq model Tensorflow

I'm going through the translation code implemented by tensorflow using seq2seq model. I'm following tensorflow tutorial about seq2seq model .

In that tutorial there is a part explaining a concept called output projection which they have implemented in seq2seq_model.py code. I understand the code. But I don't understand what this output projection part is doing.

It would be great if someone can explain me what is going on behind this output projection thing..?

Thank You!!

Internally, a neural network operates on dense vectors of some size, often 256, 512 or 1024 floats (let's say 512 for here). But at the end it needs to predict a word from the vocabulary which is often much larger, eg, 40000 words. Output projection is the final linear layer that converts (projects) from the internal representation to the larger one. So, for example, it can consist of a 512 x 40000 parameter matrix and a 40000 parameter for the bias vector. The reason it is kept separate in seq2seq code is that some loss functions (eg, the sampled softmax loss) need direct access to the final 512-sized vector and the output projection matrix. Hope that helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM