[英]How to use own word embedding with pre-trained embedding like word2vec in Keras
I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this: 我在CSV文件中存储了一个共现矩阵,其中包含单词和表情符号之间的关系,如下所示:
word emo1 emo2 emo3
w1 0.5 0.3 0.2
w2 0.8 0 0
w3 0.2 0.5 0.2
This co-occurrence matrix is huge which has 1584755
rows and 621
columns. 这个共现矩阵很大,有
1584755
行和621
列。 I have a Sequential() LSTM
model in Keras
where I use pre-trained (word2vec) word-embedding. 我在
Keras
中有一个Sequential() LSTM
模型,其中我使用了预训练的(word2vec)词嵌入。 Now I would like to use the co-occurrence matrix as another embedding layer. 现在,我想将共现矩阵用作另一个嵌入层。 How can I do that?
我怎样才能做到这一点? My current code is something like this:
我当前的代码是这样的:
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16)
Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer? 另外,如果同现矩阵稀疏,那么在嵌入层中使用它的最佳方法是什么?
You can use the Embedding
layer and set your own weight matrix like this: 您可以使用
Embedding
图层并设置自己的权重矩阵,如下所示:
Embedding(n_in, n_out, trainable=False, weights=[weights])
If I understood you correctly weights
will be your co-occurrence matrix, n_in
the number of rows and n_out
the number of columns. 如果我理解正确,
weights
将是您的共现矩阵, n_in
是行数, n_out
是列数。
You can find some more information and examples in this blog post. 您可以在此博客文章中找到更多信息和示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.