[英]Python 3.8 dictionary value sorting alphabetically
This code is meant to read a text file and add every word to a dictionary where the key is the first letter and the values are all the words in the file that start with that letter.此代码旨在读取文本文件并将每个单词添加到字典中,其中键是第一个字母,值是文件中以该字母开头的所有单词。 It kinda works but for two problems I run into:
它有点工作,但我遇到了两个问题:
' - {"don't", "i'm", "let's"}
. - {'below.', 'farm.', 'them.'}
a - {'take', 'masters', 'can', 'fallow'}
b - {'barnacle', 'labyrinth', 'pebble'}
...
...
y - {'they', 'very', 'yellow', 'pastry'}
when it should be more like:什么时候应该更像:
a - {'ape', 'army','arrow', 'arson',}
b - {'bank', 'blast', 'blaze', 'breathe'}
etc
# make empty dictionary
dic = {}
# read file
infile = open('file.txt', "r")
# read first line
lines = infile.readline()
while lines != "":
# split the words up and remove "\n" from the end of the line
lines = lines.rstrip()
lines = lines.split()
for word in lines:
for char in word:
# add if not in dictionary
if char not in dic:
dic[char.lower()] = set([word.lower()])
# Else, add word to set
else:
dic[char.lower()].add(word.lower())
# Continue reading
lines = infile.readline()
# Close file
infile.close()
# Print
for letter in sorted(dic):
print(letter + " - " + str(dic[letter]))
I'm guessing I need to remove the punctuation and apostrophes from the whole file when I'm first iterating through it but before adding anything to the dictionary?我猜当我第一次遍历文件但在向字典中添加任何内容之前,我需要从整个文件中删除标点符号和撇号? Totally lost on getting the values in the right order though.
虽然完全失去了以正确的顺序获取值。
Use defaultdict(set)
and dic[word[0]].add(word)
, after removing any starting punctuation.删除任何起始标点后,使用
defaultdict(set)
和dic[word[0]].add(word)
。 No need for the inner loop.不需要内循环。
from collections import defaultdict
def process_file(fn):
my_dict = defaultdict(set)
for word in open(fn, 'r').read().split():
if word[0].isalpha():
my_dict[word[0].lower()].add(word)
return(my_dict)
word_dict = process_file('file.txt')
for letter in sorted(word_dict):
print(letter + " - " + ', '.join(sorted(word_dict[letter])))
You have a number of problems你有很多问题
Here a short program that tries to solve the above issues这是一个试图解决上述问题的简短程序
import re, string
# instead of using "text = open(filename).read()" we exploit a piece
# of text contained in one of the imported modules
text = re.__doc__
# 1. how to split at once the text contained in the file
#
# credit to https://stackoverflow.com/a/13184791/2749397
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)
# 2. how to instantiate a set when we do the first addition to a key,
# that is, using the .setdefault method of every dictionary
d = {}
# Note: words regularized by lowercasing, we skip the empty tokens
for word in (w.lower() for w in words if w):
d.setdefault(word[0], set()).add(word)
# 3. how to print the sorted entries corresponding to each letter
for letter in sorted(d.keys()):
print(letter, *sorted(d[letter]))
My text
contains numbers, so numbers are found in the output (see below) of the above program;我的
text
包含数字,因此在上述程序的输出(见下文)中可以找到数字; if you don't want numbers filter them, if letter not in '0123456789': print(...)
.如果您不希望数字过滤它们,
if letter not in '0123456789': print(...)
。
And here it is the output...这是输出......
0 0
1 1
8 8
9 9
a a above accessible after ailmsux all alphanumeric alphanumerics also an and any are as ascii at available
b b backslash be before beginning behaviour being below bit both but by bytes
c cache can case categories character characters clear comment comments compatibility compile complement complementing concatenate consist consume contain contents corresponding creates current
d d decimal default defined defines dependent digit digits doesn dotall
e each earlier either empty end equivalent error escape escapes except exception exports expression expressions
f f find findall finditer first fixed flag flags following for forbidden found from fullmatch functions
g greedy group grouping
i i id if ignore ignorecase ignored in including indicates insensitive inside into is it iterator
j just
l l last later length letters like lines list literal locale looking
m m made make many match matched matches matching means module more most multiline must
n n name named needn newline next nicer no non not null number
o object occurrences of on only operations optional or ordinary otherwise outside
p p parameters parentheses pattern patterns perform perl plus possible preceded preceding presence previous processed provides purge
r r range rather re regular repetitions resulting retrieved return
s s same search second see sequence sequences set signals similar simplest simply so some special specified split start string strings sub subn substitute substitutions substring support supports
t t takes text than that the themselves then they this those three to
u u underscore unicode us
v v verbose version versions
w w well which whitespace whole will with without word
x x
y yes yielding you
z z z0 za
Without comments and a little obfuscation it's just 3 lines of code...没有注释和一点点混淆,它只是 3 行代码......
import re, string
text = re.__doc__
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)
d, add2d = {}, lambda w: d.setdefault(w[0],set()).add(w) #1
for word in (w.lower() for w in words if w): add2d(word) #2
for abc in sorted(d.keys()): print(abc, *sorted(d[abc])) #3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.