从文本文件加载词向量 - GENSIM PYTHON

Question

Hello i have a txt file in this form, in the first column is the word and in the second its vector.您好，我有一个这种形式的 txt 文件，第一列是单词，第二列是向量。

word 0.256 0.2659 0.326595
word1 0.528 0.6589 0.62326 ...

i am trying to load this as keyedvectors because I want to compute after the cosine similarity between the words and find the most similar words but I always get an error.我正在尝试将其加载为键控向量，因为我想计算单词之间的余弦相似度并找到最相似的单词，但我总是会出错。

Answer 1

I'm guessing the actual format includes line breaks, like:我猜实际格式包括换行符，例如：

word 0.256 0.2659 0.326595
word1 0.528 0.6589 0.62326

That's more-or-less the format common for GLoVe-trained vectors, & very similar to the text format used by Google's original word2vec.c code - which adds a 1st line with a count of vectors & their dimensionality.这或多或少是 GLoVe 训练向量的常见格式，并且与 Google 的原始word2vec.c代码使用的文本格式非常相似 - 它添加了第一行，其中包含向量计数及其维度。

(If your vectors came from one of those tools, or a public place, & there are more hints as to their format from the filename or origin, that would have been helpful to note in your question.) （如果您的矢量来自其中一种工具或公共场所，并且文件名或来源中有关其格式的更多提示，那么在您的问题中说明这一点会很有帮助。）

If I'm guessing your true format correctly, then Gensim's KeyedVectors class can load the GLoVe format via the .load_word2vec_format() method, with the no_header=True optional parameter:如果我猜对了你的真实格式，那么 Gensim 的KeyedVectors class 可以通过.load_word2vec_format()方法加载 GLoVe 格式，使用no_header=True可选参数：

vecs = KeyedVectors.load_word2vec_format(filename, binary=False, no_header=True)

See the docs for more options:有关更多选项，请参阅文档：

https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.KeyedVectors.load_word2vec_format https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.KeyedVectors.load_word2vec_format

从文本文件加载词向量 - GENSIM PYTHON

问题描述

1 个解决方案

解决方案1
1 2021-04-30 23:43:33

从文本文件加载词向量 - GENSIM PYTHON

问题描述

1 个解决方案

解决方案1 1 2021-04-30 23:43:33

解决方案1
1 2021-04-30 23:43:33