I used the following code to using glove vectors for word embeddings
from gensim.scripts.glove2word2vec import glove2word2vec #line1
glove_input_file = 'glove.840B.300d.txt' #line2
word2vec_output_file = 'glove.word2vec' #line3
glove2word2vec(glove_input_file, word2vec_output_file) #line4
from gensim.models import KeyedVectors #line5
glove_w2vec = KeyedVectors.load_word2vec_format('glove.word2vec', binary=False) #line6
I understand this chunk of code is for using glove pretrained vectors for your word embeddings. But I am not sure of what is happening in each line. Why to convert glove to word2vec format ? What does KeyedVectors.load_word2vec_format does exactly ?
Both the GloVe algorithm and word2vec
both create word-vectors, a vector per word.
But the formats for storing those vectors are slightly different. The gensim
glove2word2vec()
function will let you convert a file in GloVe format to the format used by the original Google word2vec.c
code.
https://radimrehurek.com/gensim/scripts/glove2word2vec.html
Meanwhile, the gensim
KeyedVectors.load_word2vec_format()
method can load vectors in that word2vec.c
format, into an instance of KeyedVectors
(or one of its same-interface subclasses), for easy lookup and other common word-vector operations.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.