简体   繁体   中英

How to use own word embedding with pre-trained embedding like word2vec in Keras

I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this:

word emo1 emo2 emo3
w1   0.5   0.3  0.2
w2   0.8   0    0
w3   0.2   0.5  0.2

This co-occurrence matrix is huge which has 1584755 rows and 621 columns. I have a Sequential() LSTM model in Keras where I use pre-trained (word2vec) word-embedding. Now I would like to use the co-occurrence matrix as another embedding layer. How can I do that? My current code is something like this:

model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16) 

Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer?

You can use the Embedding layer and set your own weight matrix like this:

Embedding(n_in, n_out, trainable=False, weights=[weights])

If I understood you correctly weights will be your co-occurrence matrix, n_in the number of rows and n_out the number of columns.

You can find some more information and examples in this blog post.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM