Count every character from file

Question

I am trying to count every character from a file and put it in a dictionary. But it doesn't quite work, I don't get all characters.

#!/usr/bin/env python
import os,sys

def count_chars(p):
     indx = {}
     file = open(p)

     current = 0
     for ch in file.readlines():
          c = ch[current:current+1]
          if c in indx:
               indx[c] = indx[c]+1
          else:
               indx[c] = 1           
          current+=1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Answer 1

Assuming the file you're counting fits reasonably in memory:

import collections
with open(p) as f:
    indx = collections.Counter(f.read())

Otherwise, you can read it bit by bit:

import collections
with open(p) as f:
    indx = collections.Counter()
    buffer = f.read(1024)
    while buffer:
        indx.update(buffer)
        buffer = f.read(1024)

Answer 2

The main problem is that you only examine (at most!) one character from every line. If you're reading the file line by line, you need to have an inner loop that would iterate over the line's characters.

#!/usr/bin/env python
import os, sys, collections

def count_chars(p):
     indx = collections.Counter()
     with open(p) as f:
         for line in f:
             for c in line:
                 indx[c] += 1
     print indx

if len(sys.argv) > 1:
     for e in sys.argv[1:]:
          print e, "contains:"
          count_chars(e)
else:
     print "[#] Usage: ./aufg2.py <filename>"

Answer 3

Use a defaultdict . Basically, if you try to get a nonexistent item in a defaultdict, it creates the key and calls the 0th argument specified by the constructor to be used as the value.

import collections

def count_chars(p):
    d = collections.defaultdict(int)
    for letter in open(p).read():
        d[letter] += 1
    return d

Answer 4

I've posted this as a comment to @Amber's answer, but will re-iterate here...

To count the occurences of bytes in a file, then generate a small iterator:

with open('file') as fin:
    chars = iter(lambda: fin.read(1), '')
    counts = Counter(chars)

This way the the underlying buffering from fin still applies, but it remains more implicit that you're reading one byte at a time (instead of a block size, which the OS will do on its own regardless anyway), it also allows not using update on the Counter object, and in effect becomes more of a complete, stand-alone, instruction.

Count every character from file

Question

4 answers

solution1
8 ACCPTED 2013-01-05 21:13:50

solution2
2 2013-01-05 21:09:28

solution3
1 2013-01-05 21:16:45

solution4
1 2013-01-05 21:40:32

Count every character from file

Question

4 answers

solution1 8 ACCPTED 2013-01-05 21:13:50

solution2 2 2013-01-05 21:09:28

solution3 1 2013-01-05 21:16:45

solution4 1 2013-01-05 21:40:32

solution1
8 ACCPTED 2013-01-05 21:13:50

solution2
2 2013-01-05 21:09:28

solution3
1 2013-01-05 21:16:45

solution4
1 2013-01-05 21:40:32