How to use own word embedding with pre-trained embedding like word2vec in Keras

Question

I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this:

word emo1 emo2 emo3
w1   0.5   0.3  0.2
w2   0.8   0    0
w3   0.2   0.5  0.2

This co-occurrence matrix is huge which has 1584755 rows and 621 columns. I have a Sequential() LSTM model in Keras where I use pre-trained (word2vec) word-embedding. Now I would like to use the co-occurrence matrix as another embedding layer. How can I do that? My current code is something like this:

model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16)

Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer?

Answer 1

You can use the Embedding layer and set your own weight matrix like this:

Embedding(n_in, n_out, trainable=False, weights=[weights])

If I understood you correctly weights will be your co-occurrence matrix, n_in the number of rows and n_out the number of columns.

You can find some more information and examples in this blog post.

How to use own word embedding with pre-trained embedding like word2vec in Keras

Question

1 answers

solution1
0 2018-09-03 09:25:22

How to use own word embedding with pre-trained embedding like word2vec in Keras

Question

1 answers

solution1 0 2018-09-03 09:25:22

solution1
0 2018-09-03 09:25:22