如何提取 python 中 10 個最頻繁和 10 個最不頻繁的單詞？

Question

在我用最后一行vocabulary運行幾行代碼后，我得到一個 output 。 它給了我 46132 個不同的單詞，並告訴我每個單詞在文檔中出現的次數。

我附上了 output 下面的截圖。 我不確定vocabulary是哪種格式。 我需要提取文檔中出現頻率最高的 10 個詞和出現頻率最低的 10 個詞。 我不確定該怎么做，可能是因為我不知道 output 的格式是str還是tuple 。

我可以只使用max(vocabulary)來獲取文檔中出現頻率最高的單詞嗎？ sorted(vocabulary)並獲得前 10 個和后 10 個作為文檔中出現頻率最高的 10 個和最不常見的 10 個單詞？

Answer 1

使用collections.Counter class 可以輕松獲得k個最常用的單詞：

>>> vocabulary = { 'apple': 7, 'ball': 1, 'car': 3, 'dog': 6, 'elf': 2 }
>>> from collections import Counter
>>> vocabulary = Counter(vocabulary)
>>> vocabulary.most_common(2)
[('apple', 7), ('dog', 6)]

獲得最不常用的詞也有點棘手。 最簡單的方法可能是按值對字典的鍵/值對進行排序，然后取一個切片：

>>> sorted(vocabulary.items(), key=lambda x: x[1])[:2]
[('ball', 1), ('elf', 2)]

既然兩者都需要，不如只排序一次，取兩片； 這樣你就不需要使用Counter ：

>>> sorted_vocabulary = sorted(vocabulary.items(), key=lambda x: x[1])
>>> most_common = sorted_vocabulary[-2:][::-1]
>>> least_common = sorted_vocabulary[:2]

如何提取 python 中 10 個最頻繁和 10 個最不頻繁的單詞？

問題描述

1 個解決方案

解決方案1
0 已采納 2019-11-17 05:56:50

如何提取 python 中 10 個最頻繁和 10 個最不頻繁的單詞？

問題描述

1 個解決方案

解決方案1 0 已采納 2019-11-17 05:56:50

解決方案1
0 已采納 2019-11-17 05:56:50