I was reading this answer That says about Gensim most_similar
:
it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle.
But when I tested it, that is not the case. I trained a Word2Vec with Gensim "text8"
dataset and tested these two:
model.most_similar(positive=['woman', 'king'], negative=['man'])
>>> [('queen', 0.7131118178367615), ('prince', 0.6359186768531799),...]
model.wv.most_similar([model["king"] + model["woman"] - model["man"]])
>>> [('king', 0.84305739402771), ('queen', 0.7326322793960571),...]
They are clearly not the same. even the queen score in the first is 0.713
and on the second 0.732
which are not the same.
So I ask the question again, How does Gensim most_similar
work? why the result of the two above are different?
The adding and subtracting isn't all that it does; for an exact description, you should look at the source code:
You'll see there that the addition and subtraction is on the unit-normed version of each vector, via the get_vector(key, use_norm=True)
accessor.
If you change your use of model[key]
to model.get_vector(key, use_norm=True)
, you should see your outside-the-method calculation of the target vector give the same results as letting the method combine the positive
and negative
vectors.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.