简体   繁体   中英

Tensorflow Bow Encoder Explanation

Can somebody explain to me what the Tensorflow BoW Encoder is doing/ returning? I would expect to get a vector of word counts per document (like in sklearn), however, apparently it is doing something more fancy.

In this example:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py

features = encoders.bow_encoder(
  features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE)

A 'embed_dim' is passed and i also don't understand what this is doing in the context of a BoW encoding. The documentation is sadly not very helpful. I could try to work through the Tensorflow code for sure, however, i would appreciate a high level explanation.

In the classic BOW model each word is represented by an ID (sparse vectors). The bow_encoder maps these sparse vectors to another layer with the size specified by "embed_dim". bow_encoder is used to learn a dense vector representation for word or text (as eg in the word2vec model).

From the tensorflow documentation about bow_encoder: "Maps a sequence of symbols to a vector per example by averaging embeddings."

Thus: If the input to the bow_encoder is a single word, it is just mapped to the embedded layer. While a sentence (or text) is mapped word by word and the final embedded vector is averaged .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM