查找 Python 中單詞的頻率和降序的頻率

Question

我正在嘗試查找 a.txt 文件中單詞的頻率，並通過對每個單詞的出現次數進行排序來豐富它。

到目前為止，我完成了 %90 的任務。 剩下的就是按降序對出現的次數進行排序。

這是我的代碼：

def frequency_check(lines):
    print("Frequency of words in file")
    words = re.findall(r"\w+", lines)
    item_list = []

    for item in words:
        if item not in item_list:
            item_count = words.count(item)
            print("{} : {} times".format(item, item_count))
            item_list.append(item)


with open("original-3.txt", 'r') as file1:
    lines = file1.read().lower()
    frequency_check(lines)

這是我在其中找到詞頻的 .txt 文件，

這是我得到的 output：

Frequency of words in file
return : 2 times
all : 1 times
non : 1 times
overlapping : 1 times
matches : 3 times
of : 5 times
pattern : 3 times
in : 4 times
string : 2 times
as : 1 times
a : 3 times
list : 3 times
strings : 1 times
the : 6 times
is : 1 times
scanned : 1 times
left : 1 times
to : 1 times
right : 1 times
and : 1 times
are : 3 times
returned : 1 times
order : 1 times
found : 1 times
if : 2 times
one : 2 times
or : 1 times
more : 2 times
groups : 2 times
present : 1 times
this : 1 times
will : 1 times
be : 1 times
tuples : 1 times
has : 1 times
than : 1 times
group : 1 times
empty : 1 times
included : 1 times
result : 1 times
unless : 1 times
they : 1 times
touch : 1 times
beginning : 1 times
another : 1 times
match : 1 times

Process finished with exit code 0

將這些和 output 從出現次數最高到最低進行排序將是一個巨大的挑戰。

PS：我考慮過使用字典，但是，字典是不可變的，我不能對它們使用排序方法

有任何想法嗎？

非常感謝

Answer 1

我同意@lolu ，您應該使用字典，但如果您仍想使用list ，這里有一個解決方案：

import re


def frequency_check(lines):
    print("Frequency of words in file")
    words = re.findall(r"\w+", lines)
    unique_words = set(words)
    item_list = []

    for item in unique_words:
        item_count = words.count(item)
        item_list.append((item, item_count))

    item_list.sort(key=lambda t: (t[1], t[0]), reverse=True)
    for item, item_count in item_list:
        print("{} : {} times".format(item, item_count))


with open("original-3.txt", 'r') as file1:
    lines = file1.read().lower()
    frequency_check(lines)

使用collections.Counter實現更好：

import re
from collections import Counter


def frequency_check(lines):
    print("Frequency of words in file")
    words = re.findall(r"\w+", lines)
    word_counts = Counter(words)
    for item, item_count in word_counts.most_common():
        print("{} : {} times".format(item, item_count))


with open("original-3.txt", 'r') as file1:
    lines = file1.read().lower()
    frequency_check(lines)

Answer 2

我仍然認為你應該使用字典。 它們是可變的。 但是，對於您的確切 output，您可以使用“排序的”function，它適用於列表和字典。

對於您當前的列表，按照您的放置方式：

lst = ["order : 1 times", "returned : 3 times"]   
new_lst = sorted(lst, key = lambda x : x.split(" ")[2])

請注意，當您按照我的方式使用拆分時，您的 integer 值位於第二個索引中。

sorted 給你一個列表。 如果你想使用你正在使用的當前列表，你也可以使用 function “排序”所有列表都有：

lst.sort(key=lambda x: x.split(" ")[2])

如果您選擇將其切換到目錄，請注意在我的示例中，鍵是單詞，值是計數，您可以使用它來代替：

xs = {"order":3, "and":15}
sorted(xs.items(), key=lambda x: x[1])

查找 Python 中單詞的頻率和降序的頻率

問題描述

2 個解決方案

解決方案1
3 已采納 2020-05-31 11:17:09

解決方案2
1 2020-05-31 11:09:42

查找 Python 中單詞的頻率和降序的頻率

問題描述

2 個解決方案

解決方案1 3 已采納 2020-05-31 11:17:09

解決方案2 1 2020-05-31 11:09:42

解決方案1
3 已采納 2020-05-31 11:17:09

解決方案2
1 2020-05-31 11:09:42