简体   繁体   English

如何在Keras中将自己的词嵌入与像word2vec这样的预训练嵌入一起使用

[英]How to use own word embedding with pre-trained embedding like word2vec in Keras

I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this: 我在CSV文件中存储了一个共现矩阵,其中包含单词和表情符号之间的关系,如下所示:

word emo1 emo2 emo3
w1   0.5   0.3  0.2
w2   0.8   0    0
w3   0.2   0.5  0.2

This co-occurrence matrix is huge which has 1584755 rows and 621 columns. 这个共现矩阵很大,有1584755行和621列。 I have a Sequential() LSTM model in Keras where I use pre-trained (word2vec) word-embedding. 我在Keras中有一个Sequential() LSTM模型,其中我使用了预训练的(word2vec)词嵌入。 Now I would like to use the co-occurrence matrix as another embedding layer. 现在,我想将共现矩阵用作另一个嵌入层。 How can I do that? 我怎样才能做到这一点? My current code is something like this: 我当前的代码是这样的:

model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16) 

Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer? 另外,如果同现矩阵稀疏,那么在嵌入层中使用它的最佳方法是什么?

You can use the Embedding layer and set your own weight matrix like this: 您可以使用Embedding图层并设置自己的权重矩阵,如下所示:

Embedding(n_in, n_out, trainable=False, weights=[weights])

If I understood you correctly weights will be your co-occurrence matrix, n_in the number of rows and n_out the number of columns. 如果我理解正确, weights将是您的共现矩阵, n_in是行数, n_out是列数。

You can find some more information and examples in this blog post. 您可以在博客文章中找到更多信息和示例。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 TensorFlow 中使用预训练的词嵌入(word2vec 或 Glove) - Using a pre-trained word embedding (word2vec or Glove) in TensorFlow Gensim 的 Doc2Vec - 如何使用预训练的 word2vec(词相似性) - Gensim's Doc2Vec - How to use pre-trained word2vec (word similarities) 如何从预训练的词嵌入数据集创建 Keras 嵌入层? - How do I create a Keras Embedding layer from a pre-trained word embedding dataset? 如何将word2vec嵌入作为Keras嵌入层传递? - How to pass word2vec embedding as a Keras Embedding layer? 如何加载预训练的 Word2vec 模型文件? - How to load a pre-trained Word2vec MODEL File? 使用fasttext预训练的单词向量作为在tensorflow脚本中的嵌入 - Use of fasttext Pre-trained word vector as embedding in tensorflow script 如何使用 Wiki:Fasttext.vec 和 Google News:Word2vec.bin 预训练文件作为 Keras 嵌入层的权重 - How to use Wiki: Fasttext.vec and Google News: Word2vec.bin pre trained files as weights for Keras Embedding layer 如何在不手动下载模型的情况下访问/使用Google预先训练的Word2Vec模型? - How to access/use Google's pre-trained Word2Vec model without manually downloading the model? 预训练向量,nlp,word2vec,特定主题的词嵌入? - Pre trained vectors, nlp, word2vec, word embedding for particular topic? Keras 用于从预训练的 word2vec 中查找句子相似性 - Keras for find sentences similarities from pre-trained word2vec
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM