简体   繁体   中英

How to create txt frequency counter with all letters (a-z) in python 3

I have a text file named textf that looks something like the following:

rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g

I want to do a frequency count for each letter in the text file but I want it with the condition that if a letter does not appear in the text, it should have a key:value pair with value 0. For example if z was not in the text it should look something like 'z': 0 and so on for all letters (a to z). I did the following code:

import string  
from collections import Counter 
with open("textf.txt") as tf: 
    letter = tf.read()
letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
print("Frequency count of letter:","\n",letter_count)

But the output looks something like this:

Counter({' ': 110, 'r': 12, 'c': 88, 'a': 55, 'g': 57, 'w': 76, 'm': 76, 'x': 72, 'u': 70, 'q': 41, 'y': 40, 'j': 36, 'l': 32, 'b': 18, 'd': 28, 'v': 27, 'k': 22, 't': 19, 'f': 18, 'z': 16, 'i': 7})

I am trying to make it so that the space count ' ': 110 is not shown and that I have all the letters(az) and when the letter does not appear in the text that my result prints something like 'n': 0 and so on. Any ideas or suggestions of how I could make this possible?

You can do this like so:

x = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"

import string

freq = {i:0 for i in string.ascii_lowercase}
for i in x:
    if i in freq:
        freq[i] += 1

You can also replace the for-loop with a dictionary-comprehension ( though it's inefficient for what we are trying to do since it uses count - but added as a way just for reference ):

freq = {i:x.count(i) for i in freq}

This will give as a result:

{'a': 9, 'c': 8, 'b': 3, 'e': 0, 'd': 4, 'g': 12, 'f': 1, 'i': 1, 'h': 0, 'k': 2, 'j': 1, 'm': 10, 'l': 2, 'o': 0, 'n': 0, 'q': 4, 'p': 0, 's': 0, 'r': 14, 'u': 5, 't': 2, 'w': 9, 'v': 4, 'y': 3, 'x': 6, 'z': 2}

One way to do this is to make a normal dict from your Counter, using the lowercase letters as the keys of the new dict. We use the dict.get method to supply a default value of zero for missing letters.

import string  
from collections import Counter 

letter = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"

letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
letter_count = {k: letter_count.get(k, 0) for k in string.ascii_lowercase}
print("Frequency count of letter:\n", letter_count)

output

Frequency count of letter:
 {'a': 9, 'b': 3, 'c': 8, 'd': 4, 'e': 0, 'f': 1, 'g': 12, 'h': 0, 'i': 1, 'j': 1, 'k': 2, 'l': 2, 'm': 10, 'n': 0, 'o': 0, 'p': 0, 'q': 4, 'r': 14, 's': 0, 't': 2, 'u': 5, 'v': 4, 'w': 9, 'x': 6, 'y': 3, 'z': 2}

If you do this in Python 3.6+ you get the side-benefit that the new dict is alphabetically sorted (although that behaviour is currently just an implementation detail that should not be relied upon).


As user2357112 mentions in the comments, we don't need to use letter_count.get(k, 0) , since a Counter automatically returns zero if we try to read the value of a non-existent key. So that dict comprehension can be changed to

letter_count = {k: letter_count[k] for k in string.ascii_lowercase}

You can initialise your Counter() with a dictionary. In this case a dictionary comprehension is used to initialize all the lowercase letters to zero.

Using update() with the letter will then add to these existing values:

from collections import Counter 

letter = "hello world "
letter_counts = Counter({l:0 for l in string.ascii_lowercase})
letter_counts.update(letter.translate(str.maketrans('','',string.punctuation + ' ')))

print(letter_counts)

Giving you:

Counter({'l': 3, 'o': 2, 'd': 1, 'w': 1, 'h': 1, 'r': 1, 'e': 1, 'p': 0, 'c': 0, 'j': 0, 'x': 0, 't': 0, 'g': 0, 'n': 0, 'f': 0, 'u': 0, 'm': 0, 'q': 0, 'z': 0, 's': 0, 'y': 0, 'a': 0, 'b': 0, 'i': 0, 'k': 0, 'v': 0})

To get rid of the space, add it to the punctuation string.

How about

import string
from collections import defaultdict

row="rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"
letters = string.ascii_lowercase
stats = defaultdict(list)
for l in letters:
    stats[l]=0
for l in row:
    if l.isalpha():
        stats[l]+=1

You can use dict.fromkeys to init dictionary with default 0 value for missing letters. And then update this dictionary:

import string

x = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"

letter_count = dict.fromkeys(string.ascii_lowercase, 0)
for c in x:
    if c in string.ascii_lowercase:
        letter_count[c] += 1
print letter_count

Output:

{'a': 9, 'c': 8, 'b': 3, 'e': 0, 'd': 4, 'g': 12, 'f': 1, 'i': 1, 'h': 0, 'k': 2, 'j': 1, 'm': 10, 'l': 2, 'o': 0, 'n': 0, 'q': 4, 'p': 0, 's': 0, 'r': 14, 'u': 5, 't': 2, 'w': 9, 'v': 4, 'y': 3, 'x': 6, 'z': 2}

... that my result prints something like ...

The other answers focus on choosing a different data structure, but to me it sounds like you already chose the right data structure, Counter , and just want to display the result nicely. So something like this would do:

display_str = "{" + ", ".join("'{}': {}".format(x, letter_count[x]) for x in string.ascii_lowercase) + "}"
print("Frequency count of letter:", display_str, sep="\n")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM