简体   繁体   中英

Return the rank of word in Gensim Word2vec

I am now working on a project using Gensim.word2vec, and I am a total freshman for this field.

Actually I already got a model. Are there any way that I can get the similarity rank of a word for another word. For example, the top 2 most similar words for the word 'girl' is 'lady' and then 'woman'. Are there any functions I can use if i enter 'lady' is can return 1, if i enter 'woman' it can return 2?

Thanks!

There's no gensim API for this, but you can use basic Python code to find which position (if any) a word appears in a longer sequence – such as the list of results given by gensim's most_similar() .

For example:

origin_word = 'apple'
query_word = 'orange'
all_sims = w2v_model.most_similar(origin_word, topn=0)  # topn=0 gets all results
query_index = -1
for i, sim_tuple in enumerate(all_sims):
    if sim_tuple[0] == query_word:
        query_index = i
        break 

At the end of this code, query_index will either be the (0-based) position of 'orange' in the list-of-all-similars, or -1 if not found.

Note that the most expensive step is the creation of the all_sims ordered-list of all similar words; if you are going to be checking the ranks of multiple query words against one origin word, you'd definitely want to keep the all_sims around rather than re-compute it each time.

In fact, if you were sure you were going to do lots of such lookups, potentially down through the very-deepest words, you might do a single pass to change the results into a dict:

word_to_sims_index = {}
for i, sim_tuple in enumerate(all_sims):
    word_to_sims_index[i] = sim_tuple[0]

After that, finding the index of a word would be a (quick constant-time) dict lookup...

query_index = word_to_sims_index[query_word]

...that will throw a KeyError if the query word isn't in the dict. (You could use word_to_sims_index.get(query_word, -1) if you instead wanted a default -1 response when the key is not present.)

我认为这是重复的,正如他们在另一个答案中所说,您可以使用model.rank('girl', 'lady')==1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM