Tensorflow Bow Encoder Explanation

Question

Can somebody explain to me what the Tensorflow BoW Encoder is doing/ returning? I would expect to get a vector of word counts per document (like in sklearn), however, apparently it is doing something more fancy.

In this example:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py

features = encoders.bow_encoder(
  features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE)

A 'embed_dim' is passed and i also don't understand what this is doing in the context of a BoW encoding. The documentation is sadly not very helpful. I could try to work through the Tensorflow code for sure, however, i would appreciate a high level explanation.

Answer 1

In the classic BOW model each word is represented by an ID (sparse vectors). The bow_encoder maps these sparse vectors to another layer with the size specified by "embed_dim". bow_encoder is used to learn a dense vector representation for word or text (as eg in the word2vec model).

From the tensorflow documentation about bow_encoder: "Maps a sequence of symbols to a vector per example by averaging embeddings."

Thus: If the input to the bow_encoder is a single word, it is just mapped to the embedded layer. While a sentence (or text) is mapped word by word and the final embedded vector is averaged .

Tensorflow Bow Encoder Explanation

Question

1 answers

solution1
0 ACCPTED 2017-06-27 09:31:03

Tensorflow Bow Encoder Explanation

Question

1 answers

solution1 0 ACCPTED 2017-06-27 09:31:03

solution1
0 ACCPTED 2017-06-27 09:31:03