简体   繁体   中英

Count every character from file

I am trying to count every character from a file and put it in a dictionary. But it doesn't quite work, I don't get all characters.

#!/usr/bin/env python
import os,sys

def count_chars(p):
     indx = {}
     file = open(p)

     current = 0
     for ch in file.readlines():
          c = ch[current:current+1]
          if c in indx:
               indx[c] = indx[c]+1
          else:
               indx[c] = 1           
          current+=1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Assuming the file you're counting fits reasonably in memory:

import collections
with open(p) as f:
    indx = collections.Counter(f.read())

Otherwise, you can read it bit by bit:

import collections
with open(p) as f:
    indx = collections.Counter()
    buffer = f.read(1024)
    while buffer:
        indx.update(buffer)
        buffer = f.read(1024)

The main problem is that you only examine (at most!) one character from every line. If you're reading the file line by line, you need to have an inner loop that would iterate over the line's characters.

#!/usr/bin/env python
import os, sys, collections

def count_chars(p):
     indx = collections.Counter()
     with open(p) as f:
         for line in f:
             for c in line:
                 indx[c] += 1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Use a defaultdict . Basically, if you try to get a nonexistent item in a defaultdict, it creates the key and calls the 0th argument specified by the constructor to be used as the value.

import collections

def count_chars(p):
    d = collections.defaultdict(int)
    for letter in open(p).read():
        d[letter] += 1
    return d

I've posted this as a comment to @Amber's answer, but will re-iterate here...

To count the occurences of bytes in a file, then generate a small iterator:

with open('file') as fin:
    chars = iter(lambda: fin.read(1), '')
    counts = Counter(chars)

This way the the underlying buffering from fin still applies, but it remains more implicit that you're reading one byte at a time (instead of a block size, which the OS will do on its own regardless anyway), it also allows not using update on the Counter object, and in effect becomes more of a complete, stand-alone, instruction.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM