简体   繁体   中英

word2vec - find a word by a specific vector

I trained a gensim Word2Vec model. Let's say I have a certain vector and I want the find the word it represents - what is the best way to do so?

Meaning, for a specific vector:

vec = array([-0.00449447, -0.00310097,  0.02421786, ...], dtype=float32)

I want to get a word:

 'computer' = model.vec2word(vec)

Word-vectors are generated through an iterative, approximative process – so shouldn't be thought of as precisely right (even though they do have exact coordinates), just "useful within certain tolerances".

So, there's no lookup of exact-word-for-exact-coordinates. Instead, in gensim Word2Vec and related classes there's most_similar() , which gives the known words closest to given known-words or vector coordinates, in ranked order, with the cosine-similarities. So if you've just trained (or loaded) a full Word2Vec model into the variable model , you can get the closest words to your vector with:

vec = array([-0.00449447, -0.00310097,  0.02421786, ...], dtype=float32)
similars = model.wv.most_similar(positive=[vec])
print(similars)

If you just want the single closest word, it'd be in similars[0][0] (the first position of the top-ranked tuple).

This is now supported via vocab.vectors.most_similar

import spacy
nlp = spacy.load('en_core_web_md')
word_vec = nlp(u"Test").vector
result = nlp.vocab.vectors.most_similar(word_vec.reshape((1, -1)))
print(nlp.vocab.strings[result[0][0,0]], result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM