简体   繁体   中英

I'm trying to count all letters in a txt file then display in descending order

As the title says:

So far this is where I'm at my code does work however I am having trouble displaying the information in order. Currently it just displays the information randomly.

def frequencies(filename):
    infile=open(filename, 'r')
    wordcount={}
    content = infile.read()
    infile.close()
    counter = {}
    invalid = "‘'`,.?!:;-_\n—' '"

    for word in content:
        word = content.lower()
        for letter in word:
            if letter not in invalid:
                if letter not in counter:
                    counter[letter] = content.count(letter)
                    print('{:8} appears {} times.'.format(letter, counter[letter]))

Any help would be greatly appreciated.

Dictionaries are unordered data structures. Also if you want to count some items within a set of data you better to use collections.Counter() which is more optimized and pythonic for this aim.

Then you can just use Counter.most_common(N) in order to print most N common items within your Counter object.

Also regarding the opening of files, you can simply use the with statement that closes the file at the end of the block automatically. And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want.

from collections import Counter

def frequencies(filename, top_n):
    with open(filename) as infile:
        content = infile.read()
    invalid = "‘'`,.?!:;-_\n—' '"
    counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
    for letter, count in counter.most_common(top_n):
        yield '{:8} appears {} times.'.format(letter, count)

Then use a for loop in order to iterate over the generator function:

for line in frequencies(filename, 100):
    print(line)

You don't need to iterate over 'words', and then over letters in them. When you iterate over a string (like content ), you will already have single chars (length 1 strings). Then, you would want to wait untill after your counting loop before showing output. After counting, you could manually sort:

for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
    # do stuff

However, better use collections.Counter :

from collections import Counter

content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common():  # descending order of counts
    print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n):  # limit to n most
#     print('{:8} appears {} times.'.format(letter, count))

Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered.

Sorting in descending order is quite easy using the built-in sorted (you'll need to set the reverse -argument!)

However python is batteries included and there is already a Counter . So it could be as simply as:

from collections import Counter
from operator import itemgetter

def frequencies(filename):
    # Sets are especially optimized for fast lookups so this will be
    # a perfect fit for the invalid characters.
    invalid = set("‘'`,.?!:;-_\n—' '")

    # Using open in a with block makes sure the file is closed afterwards.
    with open(filename, 'r') as infile:  
        # The "char for char ...." is a conditional generator expression
        # that feeds all characters to the counter that are not invalid.
        counter = Counter(char for char in infile.read().lower() if char not in invalid)

    # If you want to display the values:
    for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
        print(char, charcount)

The Counter already has a most_common method but you want to display all characters and counts so it's not a good fit in this case. However if you only want to know the x most common counts then it would suitable.

You can sort your dictionary at the time you print, with the sorted method:

lettercount = {}
invalid = "‘'`,.?!:;-_\n—' '"
infile = open('text.file')
for c in infile.read().lower():
    if c not in invalid:
        lettercount[c] = lettercount.setdefault(c,0) + 1
for letter in sorted(lettercount):
    print("{} appears {} times".format(letter,lettercount[letter]))

Rmq: I used setdefault change method to set the default value to 0 when we meet a letter for the first time

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM