简体   繁体   中英

How to design the output layer of word-RNN model with use word2vec embedding

I try to build a Word-RNN equivalent of Char-RNN, the net should generate next word in a sentence.

As input I use pre-trained word2vec 100-dim vectors, hidden layer size is 200. My main problem is output layer, how it should be designed?

In char-rnn, output it is vocabulary size(number of unique chars) vector with char probabilities distribution (softmax). So generating next char is simply sampling form this distribution. But using word2vec when my word vocabulary is over 300k this approach is not feasible.

Should my output generates 100-dim vector and then I should find nearest similar word with use of gensim similar_by_vector function

Could you provide some good and easy to understand python and tensorflow implementation, some link to github or publication.

I have found a similar question , but it doesn't answer my question:

You can output an index of a word (per example), thus avoid one-hot word representation (which is indeed very big). Use tf.contrib.legacy_seq2seq.sequence_loss_by_example :

Weighted cross-entropy loss for a sequence of logits (per example).

  • logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
  • targets: List of 1D batch-sized int32 Tensors of the same length as logits.
  • weights: List of 1D batch-sized float-Tensors of the same length as logits.

Note that it doesn't reduce the size of your model, but it saves a lot of memory by computing the loss from sparsely encoded labels. A complete example of a word-rnn implementation can be found here , and they use exactly this approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM