简体   繁体   English

Python Gensim从向量创建Word2Vec模型(在ndarray中)

[英]Python gensim create word2vec model from vectors (in ndarray)

I have a ndarray with words and their corresponding vector (with the size of 100 per word). 我有一个带有单词及其对应向量的ndarray(每个单词的大小为100)。 For example: 例如:

Computer 0.11 0.41 ... 0.56
Ball     0.31 0.87 ... 0.32

And so on. 等等。

I want to create a word2vec model from it: 我想从中创建一个word2vec模型:

model = load_from_ndarray(arr)

How can it be done? 如何做呢? I saw 我看见

KeyedVectors 键控向量

but it only takes file and not array 但只需要文件而不需要数组

There's no existing convenience methods to turn your own array/word-list into a KeyedVectors . 没有现有的便捷方法可以将您自己的数组/单词列表转换为KeyedVectors So you'd have to hand-construct that, in your own code. 因此,您必须使用自己的代码手动进行构建。

But it's a pretty simple object, mainly one raw array and a dict for mapping words to index-locations, and all the source is available: 但这是一个非常简单的对象,主要是一个原始数组和一个将单词映射到索引位置的字典,所有源均可用:

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py

I would especially suggest strategies of doing one or both of: 我特别建议采取以下一项或两项措施的策略:

  • taking a close look at the load_word2vec_format() method including the similarly-named supporting function in the sibling base_any2vec.py file, and seeing each of the steps they use in reading a file and constructing a full instance 仔细研究load_word2vec_format()方法,该方法在同级base_any2vec.py文件中包含类似名称的支持功能,并查看它们在读取文件和构造完整实例时使用的每个步骤。

  • training up a dummy KeyedVectors in one of the supported ways – such as by training Word2Vec on some synthetic corpus that includes exactly the words you need – and then either inspecting that object to understand the necessary parts of a working instance, or mutating that instance in-place to then have the vector-mappings you prefer. 以一种受支持的方式训练虚拟的KeyedVectors ,例如,通过在包含所需单词的合成语料库上对Word2Vec进行训练,然后检查该对象以了解工作实例的必要部分,或对该实例进行突变。然后放置您喜欢的向量映射。

from gensim.models import KeyedVectors
words = myarray[:,0]
vectors = myarray[:,1:]
model = KeyedVectors(vectors.shape[1])
model.add(words, vectors)

if you want you can then save it 如果您愿意,可以保存它

model.save('mymodel')

and later just load it 然后再加载

model = KeyedVectors.load('mymodel')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM