I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this:
word emo1 emo2 emo3
w1 0.5 0.3 0.2
w2 0.8 0 0
w3 0.2 0.5 0.2
This co-occurrence matrix is huge which has 1584755
rows and 621
columns. I have a Sequential() LSTM
model in Keras
where I use pre-trained (word2vec) word-embedding. Now I would like to use the co-occurrence matrix as another embedding layer. How can I do that? My current code is something like this:
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16)
Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer?
You can use the Embedding
layer and set your own weight matrix like this:
Embedding(n_in, n_out, trainable=False, weights=[weights])
If I understood you correctly weights
will be your co-occurrence matrix, n_in
the number of rows and n_out
the number of columns.
You can find some more information and examples in this blog post.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.