简体   繁体   English

我正在尝试计算txt文件中的所有字母,然后按降序显示

[英]I'm trying to count all letters in a txt file then display in descending order

As the title says: 正如标题所说:

So far this is where I'm at my code does work however I am having trouble displaying the information in order. 到目前为止,这是我在我的代码工作的地方,但我无法按顺序显示信息。 Currently it just displays the information randomly. 目前它只是随机显示信息。

def frequencies(filename):
    infile=open(filename, 'r')
    wordcount={}
    content = infile.read()
    infile.close()
    counter = {}
    invalid = "‘'`,.?!:;-_\n—' '"

    for word in content:
        word = content.lower()
        for letter in word:
            if letter not in invalid:
                if letter not in counter:
                    counter[letter] = content.count(letter)
                    print('{:8} appears {} times.'.format(letter, counter[letter]))

Any help would be greatly appreciated. 任何帮助将不胜感激。

Dictionaries are unordered data structures. 字典是无序的数据结构。 Also if you want to count some items within a set of data you better to use collections.Counter() which is more optimized and pythonic for this aim. 此外,如果你想计算一组数据中的一些项目,你最好使用collections.Counter() ,这是为了这个目标更优化和pythonic。

Then you can just use Counter.most_common(N) in order to print most N common items within your Counter object. 然后你可以使用Counter.most_common(N)来打印Counter对象中的大多数N常用项目。

Also regarding the opening of files, you can simply use the with statement that closes the file at the end of the block automatically. 另外,关于文件的打开,您可以简单地使用with语句自动关闭块末尾的文件。 And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want. 最好不要在函数内打印最终结果,你可以通过产生预期的线条然后在你想要的时候打印它们来使你的函数成为生成器。

from collections import Counter

def frequencies(filename, top_n):
    with open(filename) as infile:
        content = infile.read()
    invalid = "‘'`,.?!:;-_\n—' '"
    counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
    for letter, count in counter.most_common(top_n):
        yield '{:8} appears {} times.'.format(letter, count)

Then use a for loop in order to iterate over the generator function: 然后使用for循环来迭代生成器函数:

for line in frequencies(filename, 100):
    print(line)

You don't need to iterate over 'words', and then over letters in them. 您不需要迭代'单词',然后遍历其中的字母。 When you iterate over a string (like content ), you will already have single chars (length 1 strings). 迭代字符串(如content )时,您将拥有单个字符(长度为1个字符串)。 Then, you would want to wait untill after your counting loop before showing output. 然后,您需要在计数循环之后等待,直到显示输出。 After counting, you could manually sort: 计数后,您可以手动排序:

for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
    # do stuff

However, better use collections.Counter : 但是,最好使用collections.Counter

from collections import Counter

content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common():  # descending order of counts
    print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n):  # limit to n most
#     print('{:8} appears {} times.'.format(letter, count))

Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered. 以降序显示需要在搜索循环之外,否则它们将在遇到时显示。

Sorting in descending order is quite easy using the built-in sorted (you'll need to set the reverse -argument!) 在按降序排序是很容易使用内置的sorted (你需要设置reverse -argument!)

However python is batteries included and there is already a Counter . 然而,python 包含电池,并且已经有一个Counter So it could be as simply as: 所以它可以简单如下:

from collections import Counter
from operator import itemgetter

def frequencies(filename):
    # Sets are especially optimized for fast lookups so this will be
    # a perfect fit for the invalid characters.
    invalid = set("‘'`,.?!:;-_\n—' '")

    # Using open in a with block makes sure the file is closed afterwards.
    with open(filename, 'r') as infile:  
        # The "char for char ...." is a conditional generator expression
        # that feeds all characters to the counter that are not invalid.
        counter = Counter(char for char in infile.read().lower() if char not in invalid)

    # If you want to display the values:
    for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
        print(char, charcount)

The Counter already has a most_common method but you want to display all characters and counts so it's not a good fit in this case. Counter已经有一个most_common方法,但你想显示所有字符和计数,所以它不适合这种情况。 However if you only want to know the x most common counts then it would suitable. 但是,如果您只想知道x最常见的计数,那么它将是合适的。

You can sort your dictionary at the time you print, with the sorted method: 您可以使用已sorted方法在打印时对字典进行sorted

lettercount = {}
invalid = "‘'`,.?!:;-_\n—' '"
infile = open('text.file')
for c in infile.read().lower():
    if c not in invalid:
        lettercount[c] = lettercount.setdefault(c,0) + 1
for letter in sorted(lettercount):
    print("{} appears {} times".format(letter,lettercount[letter]))

Rmq: I used setdefault change method to set the default value to 0 when we meet a letter for the first time Rmq:当我们第一次遇到一封信时,我使用了setdefault change方法将默认值设置为0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM