简体   繁体   English

Keras 中的词袋嵌入层?

[英]Bag of Words embedding layer in Keras?

I have a very simple Keras model that looks like:我有一个非常简单的 Keras model 看起来像:

model = Sequential()
model.add(Dense(hidden_size, input_dim=n_inputs, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))

The embedding that I am using is Bag of Words.我使用的嵌入是词袋。

I want to include the embedding step as part of the model.我想将嵌入步骤作为 model 的一部分。 I thought of doing it as a embedding layer... but I don't know wether is possible to implement a Bag of Words model as a Keras Embedding Layer?我想把它作为一个嵌入层......但我不知道是否可以将一个词袋 model 实现为 Keras 嵌入层? I know you can pass pre-trained BoW and GloVe embedding models to Embedding layers, so I was wondering if something like that could be done with BOW?我知道你可以将预训练的 BoW 和 GloVe 嵌入模型传递给嵌入层,所以我想知道是否可以用 BOW 完成类似的事情?

Any ideas will be much appreciated: :D任何想法将不胜感激::D

The embeddings layer in Keras (and basically all deep learning frameworks) does a lookup: for a token index, it returns a dense embedding. Keras(以及基本上所有深度学习框架)中的嵌入层进行查找:对于令牌索引,它返回密集嵌入。

The question is how do you want to embed a bag-of-words representation?问题是你想如何嵌入词袋表示? I think one of the reasonable options would be:我认为合理的选择之一是:

  1. Do the embedding lookup for every word,对每个单词进行嵌入查找,
  2. Average the token embeddings and thus get a single vector representing the BoW.平均令牌嵌入,从而得到一个表示 BoW 的向量。 In Keras, you can use the GlobalAveragePooling1D for that.在 Keras 中,您可以为此使用GlobalAveragePooling1D

Averaging is probably a better option than summing because the output will be of the same scale for sequences of different lengths.平均可能比求和更好,因为 output 对于不同长度的序列将具有相同的比例。

Note that for the embedding lookup, you need the input to has a shape of batch × sequence length with integers corresponding to token indices in a vocabulary.请注意,对于嵌入查找,您需要输入具有批量×序列长度的形状,其中整数对应于词汇表中的标记索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM