简体   繁体   中英

How to get the tf-idf values in gensim in python

I am calculating my tf-idf values as follows using genism.

texts = [['human', 'interface', 'computer'],
 ['survey', 'user', 'computer', 'system', 'response', 'time'],
 ['eps', 'user', 'interface', 'system'],
 ['system', 'human', 'system', 'eps'],
 ['user', 'response', 'time'],
 ['trees'],
 ['graph', 'trees'],
 ['graph', 'minors', 'trees'],
 ['graph', 'minors', 'survey']]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf = models.TfidfModel(corpus)

Now, I want to get the 3 words that has the highest tf-idf value. Please help me!

After a bit of searching, it looks like you might want this - it's not the most readable but it might work.

top_3 = [t[0] for t in
         sorted([(word, i, j) for j, text in enumerate(texts) for i, word in enumerate(text)],
                key=lambda t: tfidf[t[2]][t[1]])[:3]]

I take words from the texts and keep track of their row (as i) and column (as j) with a tuple of the form (word, i, j) . I then sort the words based on their value in tfidf . I then take the top 3 (using [:3] ) and take the word out of the tuple with t[0] for t in ... .

This can easily be modified to store any number of the words in order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM