[英]How to find the most frequent words in alphabetical order?
我試圖在這個不同的程序中按字母順序在文本文件中找到最常用的單詞。
例如,單詞:“ that”是文本文件中最常見的單詞。 因此,應首先打印:“ that#”
程序和下面的答案必須采用這種格式:
d = dict()
def counter_one():
d = dict()
word_file = open('gg.txt')
for line in word_file:
word = line.strip().lower()
d = counter_two(word, d)
return d
def counter_two(word, d):
d = dict()
word_file = open('gg.txt')
for line in word_file:
if word not in d:
d[word] = 1
else:
d[word] + 1
return d
def diction(d):
for key, val in d.iteritems():
print key, val
counter_one()
diction(d)
它應該在shell中運行如下代碼:
>>>
Words in text: ###
Frequent Words: ###
that 11
the 11
we 10
which 10
>>>
一種簡單的獲取頻率計數的方法是在內置收集模塊中使用Counter類 。 它允許您傳遞單詞列表,它將自動對所有單詞進行計數並將每個單詞映射到其頻率。
from collections import Counter
frequencies = Counter()
with open('gg.txt') as f:
for line in f:
frequencies.update(line.lower().split())
我使用了lower()
函數來避免分別計算“ the”和“ The”。
然后,如果只想要頂部n
則可以按頻率順序輸出它們,帶有frequencies.most_common()
或frequencies.most_common(n)
。
如果要按頻率對結果列表進行排序,然后按字母順序對具有相同頻率的元素進行sorted
,則可以將已sorted
內置函數與key
參數lambda (x,y): (y,x)
。 因此,執行此操作的最終代碼將是:
from collections import Counter
frequencies = Counter()
with open('gg.txt') as f:
for line in f:
frequencies.update(line.lower().split())
most_frequent = sorted(frequencies.most_common(4), key=lambda (x,y): (y,x))
for (word, count) in most_frequent:
print word, count
然后輸出將是
that 11
the 11
we 10
which 10
您可以使用collection的Counter
簡化此操作。 首先,對單詞進行計數,然后按每個單詞的出現次數和單詞本身進行排序:
from collections import Counter
# Load the file and extract the words
lines = open("gettysburg_address.txt").readlines()
words = [ w for l in lines for w in l.rstrip().split() ]
print 'Words in text:', len(words)
# Use counter to get the counts
counts = Counter( words )
# Sort the (word, count) tuples by the count, then the word itself,
# and output the k most frequent
k = 4
print 'Frequent words:'
for w, c in sorted(counts.most_common(k), key=lambda (w, c): (c, w), reverse=True):
print '%s %s' % (w, c)
輸出:
Words in text: 278
Frequent words:
that 13
the 9
we 8
to 8
您為什么繼續重新打開文件並創建新詞典? 您的代碼需要做什么?
create a new empty dictionary to store words {word: count}
open the file
work through each line (word) in the file
if the word is already in the dictionary
increment count by one
if not
add to dictionary with count 1
然后,您可以輕松獲得字數
len(dictionary)
和n
最常見的單詞及其數量
sorted(dictionary.items(), key=lambda x: x[1], reverse=True)[:n]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.