简体   繁体   中英

Understanding usage of glove vectors

I used the following code to using glove vectors for word embeddings

from gensim.scripts.glove2word2vec import glove2word2vec    #line1
glove_input_file = 'glove.840B.300d.txt'  #line2
word2vec_output_file = 'glove.word2vec'   #line3
glove2word2vec(glove_input_file, word2vec_output_file)  #line4
from gensim.models import KeyedVectors  #line5
glove_w2vec = KeyedVectors.load_word2vec_format('glove.word2vec', binary=False) #line6

I understand this chunk of code is for using glove pretrained vectors for your word embeddings. But I am not sure of what is happening in each line. Why to convert glove to word2vec format ? What does KeyedVectors.load_word2vec_format does exactly ?

Both the GloVe algorithm and word2vec both create word-vectors, a vector per word.

But the formats for storing those vectors are slightly different. The gensim glove2word2vec() function will let you convert a file in GloVe format to the format used by the original Google word2vec.c code.

https://radimrehurek.com/gensim/scripts/glove2word2vec.html

Meanwhile, the gensim KeyedVectors.load_word2vec_format() method can load vectors in that word2vec.c format, into an instance of KeyedVectors (or one of its same-interface subclasses), for easy lookup and other common word-vector operations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM