Python 3.8 字典值按字母順序排序

Question

此代碼旨在讀取文本文件並將每個單詞添加到字典中，其中鍵是第一個字母，值是文件中以該字母開頭的所有單詞。 它有點工作，但我遇到了兩個問題：

字典鍵包含撇號和句點（如何排除？）
這些值不是按字母順序排序的，而且都是亂七八糟的。 代碼最終輸出如下內容：

' - {"don't", "i'm", "let's"}
. - {'below.', 'farm.', 'them.'}
a - {'take', 'masters', 'can', 'fallow'}
b - {'barnacle', 'labyrinth', 'pebble'}
...
...
y - {'they', 'very', 'yellow', 'pastry'}

什么時候應該更像：

a - {'ape', 'army','arrow', 'arson',}
b - {'bank', 'blast', 'blaze', 'breathe'}
etc

# make empty dictionary
dic = {}

# read file
infile = open('file.txt', "r")

# read first line
lines = infile.readline()
while lines != "":
    # split the words up and remove "\n" from the end of the line
    lines = lines.rstrip()
    lines = lines.split()

    for word in lines:
        for char in word: 
            # add if not in dictionary
             if char not in dic: 
                dic[char.lower()] = set([word.lower()])
            # Else, add word to set
             else:
                dic[char.lower()].add(word.lower())
    # Continue reading
    lines = infile.readline()

# Close file
infile.close()

# Print
for letter in sorted(dic): 
    print(letter + " - " + str(dic[letter]))

我猜當我第一次遍歷文件但在向字典中添加任何內容之前，我需要從整個文件中刪除標點符號和撇號？ 雖然完全失去了以正確的順序獲取值。

Answer 1

刪除任何起始標點后，使用defaultdict(set)和dic[word[0]].add(word) 。 不需要內循環。

Answer 2

from collections import defaultdict


def process_file(fn):
    my_dict = defaultdict(set)
    for word in open(fn, 'r').read().split():
        if word[0].isalpha():
            my_dict[word[0].lower()].add(word)
    return(my_dict)


word_dict = process_file('file.txt') 
for letter in sorted(word_dict): 
    print(letter + " - " + ', '.join(sorted(word_dict[letter])))

Answer 3

你有很多問題

在空格和標點符號上拆分單詞
將第一次添加時不存在的詞添加到集合中
排序輸出

這是一個試圖解決上述問題的簡短程序

import re, string

# instead of using "text = open(filename).read()" we exploit a piece
# of text contained in one of the imported modules
text = re.__doc__

# 1. how to split at once the text contained in the file
#
# credit to https://stackoverflow.com/a/13184791/2749397
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)

# 2. how to instantiate a set when we do the first addition to a key,
#    that is, using the .setdefault method of every dictionary
d = {}
# Note: words regularized by lowercasing, we skip the empty tokens    
for word in (w.lower() for w in words if w):
    d.setdefault(word[0], set()).add(word)

# 3. how to print the sorted entries corresponding to each letter
for letter in sorted(d.keys()):
    print(letter, *sorted(d[letter]))

我的text包含數字，因此在上述程序的輸出（見下文）中可以找到數字； 如果您不希望數字過濾它們， if letter not in '0123456789': print(...) 。

這是輸出......

0 0
1 1
8 8
9 9
a a above accessible after ailmsux all alphanumeric alphanumerics also an and any are as ascii at available
b b backslash be before beginning behaviour being below bit both but by bytes
c cache can case categories character characters clear comment comments compatibility compile complement complementing concatenate consist consume contain contents corresponding creates current
d d decimal default defined defines dependent digit digits doesn dotall
e each earlier either empty end equivalent error escape escapes except exception exports expression expressions
f f find findall finditer first fixed flag flags following for forbidden found from fullmatch functions
g greedy group grouping
i i id if ignore ignorecase ignored in including indicates insensitive inside into is it iterator
j just
l l last later length letters like lines list literal locale looking
m m made make many match matched matches matching means module more most multiline must
n n name named needn newline next nicer no non not null number
o object occurrences of on only operations optional or ordinary otherwise outside
p p parameters parentheses pattern patterns perform perl plus possible preceded preceding presence previous processed provides purge
r r range rather re regular repetitions resulting retrieved return
s s same search second see sequence sequences set signals similar simplest simply so some special specified split start string strings sub subn substitute substitutions substring support supports
t t takes text than that the themselves then they this those three to
u u underscore unicode us
v v verbose version versions
w w well which whitespace whole will with without word
x x
y yes yielding you
z z z0 za

沒有注釋和一點點混淆，它只是 3 行代碼......

import re, string
text = re.__doc__
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)

d, add2d = {}, lambda w: d.setdefault(w[0],set()).add(w) #1
for word in (w.lower() for w in words if w): add2d(word) #2
for abc in sorted(d.keys()): print(abc, *sorted(d[abc])) #3

Python 3.8 字典值按字母順序排序

問題描述

3 個解決方案

解決方案1
1 2020-02-12 10:32:55

解決方案2
1 2020-02-12 10:38:25

解決方案3
0 2020-02-12 12:19:21

Python 3.8 字典值按字母順序排序

問題描述

3 個解決方案

解決方案1 1 2020-02-12 10:32:55

解決方案2 1 2020-02-12 10:38:25

解決方案3 0 2020-02-12 12:19:21

解決方案1
1 2020-02-12 10:32:55

解決方案2
1 2020-02-12 10:38:25

解決方案3
0 2020-02-12 12:19:21