I am calculating my tf-idf values as follows using genism.
texts = [['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf = models.TfidfModel(corpus)
Now, I want to get the 3 words that has the highest tf-idf value. Please help me!
After a bit of searching, it looks like you might want this - it's not the most readable but it might work.
top_3 = [t[0] for t in
sorted([(word, i, j) for j, text in enumerate(texts) for i, word in enumerate(text)],
key=lambda t: tfidf[t[2]][t[1]])[:3]]
I take words from the texts and keep track of their row (as i) and column (as j) with a tuple of the form (word, i, j)
. I then sort the words based on their value in tfidf
. I then take the top 3 (using [:3]
) and take the word out of the tuple with t[0] for t in ...
.
This can easily be modified to store any number of the words in order.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.