简体   繁体   English

了解手套向量的用法

[英]Understanding usage of glove vectors

I used the following code to using glove vectors for word embeddings我使用以下代码使用手套向量进行词嵌入

from gensim.scripts.glove2word2vec import glove2word2vec    #line1
glove_input_file = 'glove.840B.300d.txt'  #line2
word2vec_output_file = 'glove.word2vec'   #line3
glove2word2vec(glove_input_file, word2vec_output_file)  #line4
from gensim.models import KeyedVectors  #line5
glove_w2vec = KeyedVectors.load_word2vec_format('glove.word2vec', binary=False) #line6

I understand this chunk of code is for using glove pretrained vectors for your word embeddings.我知道这段代码是用于将手套预训练向量用于您的词嵌入。 But I am not sure of what is happening in each line.但我不确定每一行发生了什么。 Why to convert glove to word2vec format ?为什么要将手套转换为 word2vec 格式? What does KeyedVectors.load_word2vec_format does exactly ? KeyedVectors.load_word2vec_format 究竟做了什么?

Both the GloVe algorithm and word2vec both create word-vectors, a vector per word. GloVe 算法和word2vec都创建词向量,每个词一个向量。

But the formats for storing those vectors are slightly different.但是存储这些向量的格式略有不同。 The gensim glove2word2vec() function will let you convert a file in GloVe format to the format used by the original Google word2vec.c code. gensim glove2word2vec()函数可让您将 GloVe 格式的文件转换为原始 Google word2vec.c代码使用的格式。

https://radimrehurek.com/gensim/scripts/glove2word2vec.html https://radimrehurek.com/gensim/scripts/glove2word2vec.html

Meanwhile, the gensim KeyedVectors.load_word2vec_format() method can load vectors in that word2vec.c format, into an instance of KeyedVectors (or one of its same-interface subclasses), for easy lookup and other common word-vector operations.同时, gensim KeyedVectors.load_word2vec_format()方法可以将word2vec.c格式的向量加载到KeyedVectors的实例(或其相同接口子类之一)中,以便于查找和其他常见的词向量操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM