计算文件中的字母频率并写入输出文件python

Question

I am writing a function that take a in_file and checks the frequency of the letters in that file and writes in this format (letter:frequency) to a out_file.This is what i got so far can anyone help? 我正在编写一个函数，它接受一个in_file并检查该文件中字母的频率，并以这种格式（字母：频率）写入out_file。这是我到目前为止所能得到的任何人的帮助吗？

def count_letters(in_file,out_file):
    in_file = open(in_file,"r")
    out_file = open(out_file,"w")
    for line in in_file:
        words = line.split()
        for word in words:
            for letter in word:
                print(letter,':',line.count(letter),file=out_file,end="\n")

Answer 1

There is no need to split words, at all; 完全没有必要分词; directly passing a string to the counter updates the counts per character. 直接将字符串传递给计数器会更新每个字符的计数。 You also need to collect all counts first , and only then write them out to the output file: 您还需要先收集所有计数，然后才将它们写入输出文件：

from collections import Counter

def count_letters(in_filename, out_filename):
    counts = Counter()
    with open(in_filename, "r") as in_file:
        for chunk in iter(lambda: in_file.read(8196), ''):
            counts.update(chunk)
    with open(out_filename, "w") as out_file:
        for letter, count in counts.iteritems():
            out_file.write('{}:{}\n'.format(letter, count)

Note that the inputfile is processed in 8kb chunks rather than in one go; 请注意，输入文件以8kb块的形式处理，而不是一次性处理; you can adjust the block size (preferably in powers of 2) to maximize throughput. 您可以调整块大小（最好是2的幂）以最大化吞吐量。

You could use .most_common() instead of .iteritems() here if you want your output file to be sorted by frequency (descending). 如果希望输出文件按频率（降序）排序，可以在此处使用.most_common()而不是.iteritems() ）。

Answer 2

This should do the trick - it counts all charaters, not only letters: 这应该可以解决问题 - 它会计算所有字符，而不仅仅是字母：

def count_letters(in_file,out_file):
    from collections import Counter
    in_file = open(in_file,"r")
    out_file = open(out_file,"w")
    letter_counts = Counter()
    with open(in_file, 'r') as in_file:
        for line in in_file:
            line = line.strip()
            for letter in line:
                # Count only letters.
                if not letter.isalpha():
                    continue
                letter_counts[letter] += 1

    with open(out_file, 'w') as out_file:
        for letter, count in letter_counts.iteritems():
            out_file.write('{}:{}\n'.format(letter, count))

计算文件中的字母频率并写入输出文件python

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-08-10 18:31:48

解决方案2
0 2013-08-10 18:31:23

计算文件中的字母频率并写入输出文件python

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-08-10 18:31:48

解决方案2 0 2013-08-10 18:31:23

解决方案1
4 已采纳 2013-08-10 18:31:48

解决方案2
0 2013-08-10 18:31:23