简体   繁体   中英

Is there a way to iterate through the vectors of Gensim's Word2Vec?

I'm trying to perform a simple task which requires iterations and interactions with specific vectors after loading it into gensim's Word2Vec.

Basically, given a txt file of the form:

t1 -0.11307 -0.63909 -0.35103 -0.17906 -0.12349
t2 0.54553 0.18002 -0.21666 -0.090257 -0.13754
t3 0.22159 -0.13781 -0.37934 0.39926 -0.25967 

where t1 is the name of the vector and what follows are the vectors themselves. I load it in using the function vecs = KeyedVectors.load_word2vec_format(datapath(f), binary=False) .

Now, I want to iterate through the vectors I have and make a calculation, take summing up all of the vectors as an example. If this was read in using with open(f) , I know I can just use .split(' ') on it, but since this is now a KeyedVector object, I'm not sure what to do.

I've looked through the word2vec documentation, as well as used dir(KeyedVectors) but I'm still not sure if there is an attribute like KeyedVectors.vectors or something that allows me to perform this task.

Any tips/help/advice would be much appreciated!

There's a list of all words in the KeyedVectors object in its .index_to_key property. So one way to sum all the vectors would be to retrieve each by name in a list comprehension:

np.sum([vecs[key] for key in vecs.index_to_key], axis=0)

But, if all you really wanted to do is sum the vectors – and the keys (word tokens) aren't an important part of your calculation, the set of all the raw word-vectors is available in the .vectors property, as a numpy array with one vector per row. So you could also do:

np.sum(vecs.vectors, axis=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM