繁体   English   中英

在 python 文件中查找所有空格、换行符和制表符

[英]find all spaces, newlines and tabs in a python file

def count_spaces(filename): 
    input_file = open(filename,'r') 
    file_contents = input_file.read() 
    space = 0 
    tabs = 0 
    newline = 0 
    for line in file_contents == " ": 
        space +=1 
        return space
    for line in file_contents == '\t': 
        tabs += 1 
        return tabs 
    for line in file_contents == '\n': 
        newline += 1
        return newline 
    input_file.close()

我正在尝试编写一个函数,它将文件名作为参数并返回文件中所有空格、换行符和制表符的总数。 我想尝试使用基本的 for 循环和 if 语句,但目前我正在挣扎:/任何帮助都会非常感谢。

您当前的代码不起作用,因为您将循环语法( for x in y )与条件测试( x == y )结合在一个混乱的语句中。 你需要把它们分开。

您还需要只使用一个return语句,否则您到达的第一个语句将停止运行该函数,并且永远不会返回其他值。

尝试:

for character in file_contents:
    if character == " ":
        space +=1
    elif character == '\t': 
        tabs += 1
    elif character == '\n': 
        newline += 1
return space, tabs, newline

Joran Beasley 的答案中的代码是解决问题的更 Pythonic 的方法。 您可以使用collections.Counter来计算文件中所有字符的出现次数,而不是为每种字符设置单独的条件,并在末尾提取空白字符的计数。 Counter工作方式很像字典。

from collections import Counter

def count_spaces(filename):
    with open(filename) as in_f:
        text = in_f.read()
    count = Counter(text)
    return count[" "], count["\t"], count["\n"]

为了支持大文件,您可以一次读取固定数量的字节:

#!/usr/bin/env python
from collections import namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb') as file:
        chunk = file.read(chunk_size)
        while chunk:
            nspaces   += chunk.count(b' ')
            ntabs     += chunk.count(b'\t')
            nnewlines += chunk.count(b'\n')
            chunk = file.read(chunk_size)
    return Count(nspaces, ntabs, nnewlines)

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=150, ntabs=0, nnewlines=20)

mmap允许您将文件视为字节串,而无需实际将整个文件加载到内存中,例如,您可以在其中搜索正则表达式模式:

#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb', 0) as file, \
         mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
        c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
        return Count(c[b' '], c[b'\t'], c[b'\n'])

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=107, ntabs=0, nnewlines=18)
C=Counter(open(afile).read())
C[' ']

在我的例子中 tab(\\t) 被转换为 " "(四个空格)。 所以我稍微修改了逻辑来解决这个问题。

def count_spaces(filename):
    with open(filename,"r") as f1:
        contents=f1.readlines()

    total_tab=0
    total_space=0
    for line in contents:
        total_tab += line.count("    ")
        total_tab += line.count("\t")
        total_space += line.count(" ")
    print("Space count = ",total_space)
    print("Tab count = ",total_tab)
    print("New line count = ",len(contents))
    return total_space,total_tab,len(contents)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM