在 python 文件中查找所有空格、换行符和制表符

Question

def count_spaces(filename): 
    input_file = open(filename,'r') 
    file_contents = input_file.read() 
    space = 0 
    tabs = 0 
    newline = 0 
    for line in file_contents == " ": 
        space +=1 
        return space
    for line in file_contents == '\t': 
        tabs += 1 
        return tabs 
    for line in file_contents == '\n': 
        newline += 1
        return newline 
    input_file.close()

我正在尝试编写一个函数，它将文件名作为参数并返回文件中所有空格、换行符和制表符的总数。 我想尝试使用基本的 for 循环和 if 语句，但目前我正在挣扎：/任何帮助都会非常感谢。

Answer 1

您当前的代码不起作用，因为您将循环语法（ for x in y ）与条件测试（ x == y ）结合在一个混乱的语句中。 你需要把它们分开。

您还需要只使用一个return语句，否则您到达的第一个语句将停止运行该函数，并且永远不会返回其他值。

尝试：

for character in file_contents:
    if character == " ":
        space +=1
    elif character == '\t': 
        tabs += 1
    elif character == '\n': 
        newline += 1
return space, tabs, newline

Joran Beasley 的答案中的代码是解决问题的更 Pythonic 的方法。 您可以使用collections.Counter类来计算文件中所有字符的出现次数，而不是为每种字符设置单独的条件，并在末尾提取空白字符的计数。 Counter工作方式很像字典。

from collections import Counter

def count_spaces(filename):
    with open(filename) as in_f:
        text = in_f.read()
    count = Counter(text)
    return count[" "], count["\t"], count["\n"]

Answer 2

为了支持大文件，您可以一次读取固定数量的字节：

#!/usr/bin/env python
from collections import namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb') as file:
        chunk = file.read(chunk_size)
        while chunk:
            nspaces   += chunk.count(b' ')
            ntabs     += chunk.count(b'\t')
            nnewlines += chunk.count(b'\n')
            chunk = file.read(chunk_size)
    return Count(nspaces, ntabs, nnewlines)

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=150, ntabs=0, nnewlines=20)

mmap允许您将文件视为字节串，而无需实际将整个文件加载到内存中，例如，您可以在其中搜索正则表达式模式：

#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple

Count = namedtuple('Count', 'nspaces ntabs nnewlines')

def count_spaces(filename, chunk_size=1 << 13):
    """Count number of spaces, tabs, and newlines in the file."""
    nspaces = ntabs = nnewlines = 0
    # assume ascii-based encoding and b'\n' newline
    with open(filename, 'rb', 0) as file, \
         mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
        c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
        return Count(c[b' '], c[b'\t'], c[b'\n'])

if __name__ == "__main__":
    print(count_spaces(__file__))

输出

Count(nspaces=107, ntabs=0, nnewlines=18)

Answer 3

C=Counter(open(afile).read())
C[' ']

Answer 4

在我的例子中 tab(\\t) 被转换为 " "（四个空格）。 所以我稍微修改了逻辑来解决这个问题。

def count_spaces(filename):
    with open(filename,"r") as f1:
        contents=f1.readlines()

    total_tab=0
    total_space=0
    for line in contents:
        total_tab += line.count("    ")
        total_tab += line.count("\t")
        total_space += line.count(" ")
    print("Space count = ",total_space)
    print("Tab count = ",total_tab)
    print("New line count = ",len(contents))
    return total_space,total_tab,len(contents)

在 python 文件中查找所有空格、换行符和制表符

问题描述

4 个解决方案

解决方案1
1 2015-09-26 01:36:55

解决方案2
1 2015-09-26 03:08:24

输出

输出

解决方案3
0 2015-09-26 00:36:21

解决方案4
0 2020-01-12 17:26:36

在 python 文件中查找所有空格、换行符和制表符

问题描述

4 个解决方案

解决方案1 1 2015-09-26 01:36:55

解决方案2 1 2015-09-26 03:08:24

输出

输出

解决方案3 0 2015-09-26 00:36:21

解决方案4 0 2020-01-12 17:26:36

解决方案1
1 2015-09-26 01:36:55

解决方案2
1 2015-09-26 03:08:24

解决方案3
0 2015-09-26 00:36:21

解决方案4
0 2020-01-12 17:26:36