简体   繁体   English

Python计算文件中字符的类型

[英]Python Count how many types of characters in a file

New to python, I am writing a script that is doing a bunch of I/O stuff, one function is suppose to count how many character types which are = [OHCN] are in a file, not how many occurrences. python的新手,我正在编写一个执行大量I / O任务的脚本,一个功能是假设要计算文件中有[OHCN]个字符类型,而不是多少个字符类型。 for examples: if a file has "OOOOOHHHHNNN" it would be 3. Here is what I have, is there a better and more efficient way of doing this? 例如:如果文件中包含“ OOOOOHHHHNNN”,则为3。这就是我所拥有的,是否有一种更好,更有效的方法? One more question, I am doing a lot of file editing in this script, initially I have a few functions that open the files that need to be modified. 另一个问题是,我正在此脚本中进行大量文件编辑,最初,我有一些功能可以打开需要修改的文件。 Would it be more efficient to handle everything in one function (so open the file once and do what I need to do in the file) or have each function open and the files and do its thing then close, then have the other function open and do that thing etc.... again thank you for any help 处理一个功能中的所有内容(因此,一次打开文件,然后执行我在文件中要做的事情),或者打开每个功能并执行文件操作然后关闭,然后打开另一个功能,会更有效吗?做那个事情等等。。。再次感谢您的帮助

def ReadFile(xyzfile, inputFile):

     key_atoms = "OHCN"
     s =  open(xyzfile).read()

     atom_count = {ltr: 0 for ltr in key_atoms}

     for char in text:
         if char in key_atoms:
             atom_count[char] += 1
     for key in sorted(atom_count):
        with open(inputFile) as f:
             string1 = "ntyp = 2"
             string2 = "ntyp = ", atom_count[key]
             s = f.read()
             s = s.replace(str(string1), str(string2))

If you're after the unique types of each atom (or character), then we can use a set and find the intersection of that with the characters in a file which we can access without reading the entire file into memory (we use itertools.chain here instead of a nested loop). 如果您追求每个原子(或字符)的唯一类型,那么我们可以使用一个set并在文件中找到字符与该字符的交集,而无需将整个文件读入内存就可以访问该文件(我们使用itertools.chain在这里,而不是一个嵌套循环)。 Also by using the with statement with both files we get an all or nothing approach (if we can't open both xyzfile and input_file - then we shouldn't bother to proceed anyway). 同样通过对两个文件使用with语句,我们将获得全有或全无的方法(如果我们无法同时打开xyzfile和input_file,则无论如何我们都不要打扰)。 Your current code looks like it can be reduced to: 您当前的代码看起来可以简化为:

from itertools import chain

with open(xyzfile) as f1, open(input_file) as f2:
    atom_count = len(set('OHCN').intersection(chain.from_iterable(f1)))
    s = f2.read().replace('ntyp = 2', 'nytp = {}'.format(atom_count))

Your replacement could probably be more efficient but it's not specified what s is being used for. 您的替换可能更有效,但未指定s的用途。

counts = {}
with open(infilepath) as infile:
    for line in infile:
        for char in line:
            if char not in counts:
                counts[char] = 0
            counts[char] += 1

print("There are", len(counts), "different characters in the file")
for key in counts:
    print("There are", counts[key], "occurrences of", key, "in the file")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM