Can somebody explain to me what the Tensorflow BoW Encoder is doing/ returning? I would expect to get a vector of word counts per document (like in sklearn), however, apparently it is doing something more fancy.
In this example:
features = encoders.bow_encoder(
features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE)
A 'embed_dim' is passed and i also don't understand what this is doing in the context of a BoW encoding. The documentation is sadly not very helpful. I could try to work through the Tensorflow code for sure, however, i would appreciate a high level explanation.
In the classic BOW model each word is represented by an ID (sparse vectors). The bow_encoder maps these sparse vectors to another layer with the size specified by "embed_dim". bow_encoder is used to learn a dense vector representation for word or text (as eg in the word2vec model).
From the tensorflow documentation about bow_encoder: "Maps a sequence of symbols to a vector per example by averaging embeddings."
Thus: If the input to the bow_encoder is a single word, it is just mapped to the embedded layer. While a sentence (or text) is mapped word by word and the final embedded vector is averaged .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.