使用tf-idf（Gensim）獲取語料庫中最重要的詞

Question

我正在計算tf-idf如下。

texts=['human interface computer',
 'survey user computer system response time',
 'eps user interface system',
 'system human system eps',
 'user response time']

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
analyzedDocument = namedtuple('AnalyzedDocument', 'word tfidf_score')
d=[]
for doc in corpus_tfidf:
    for id, value in doc:
        word = dictionary.get(id)
        score = value
        d.append(analyzedDocument(word, score))

但是，現在我想使用idf值最高的單詞來確定語料庫中最重要的3個單詞。 請讓我知道該怎么做？

Answer 1

假設您的清單確定正確，則應該能夠按照以下方式進行排列：最上方：

from operator import itemgetter

然后在底部：

e=sorted(d, key=itemgetter(1))
top3 = e[:3]
print(top3)

使用tf-idf（Gensim）獲取語料庫中最重要的詞

問題描述

1 個解決方案

解決方案1
0 2017-11-17 17:13:43

使用tf-idf（Gensim）獲取語料庫中最重要的詞

問題描述

1 個解決方案

解決方案1 0 2017-11-17 17:13:43

解決方案1
0 2017-11-17 17:13:43