使用 readline 限制读取的数量

Question

我正在尝试读取大文本文件的前 100 行。 执行此操作的简单代码如下所示。 但是，挑战在于我必须防范没有任何换行符的损坏或其他扭曲文件的情况（是的，人们以某种方式想出了生成这些文件的方法）。 在这些情况下，我仍然想读入数据（因为我需要查看那里发生了什么），但将其限制为 n 个字节。

我能想到的唯一方法是逐个字符读取文件。 除了速度慢（可能只有 100 行不是问题）我担心当我遇到使用非 ASCII 编码的文件时会遇到麻烦。

是否可以限制使用 readline() 读取的字节数？ 或者有没有更优雅的方法来处理这个问题？

line_count = 0
with open(filepath, 'r') as f:
    for line in f:
        line_count += 1
        print('{0}: {1}'.format(line_count, line))
        if line_count == 100:
            break

编辑：

正如@Fredrik 正确指出的那样， readline() 接受一个限制读取字符数的参数（我认为这是一个缓冲区大小参数）。 因此，就我的目的而言，以下内容非常有效：

max_bytes = 1024*1024
bytes_read = 0

fo = open(filepath, "r")
line = fo.readline(max_bytes)
bytes_read += len(line)
line_count = 0
while line != '':
    line_count += 1
    print('{0}: {1}'.format(line_count, line))
    if (line_count == 100) or (bytes-read >= max_bytes):
        break
    else:
        line = fo.readline(max_bytes - bytes_read)
        bytes_read += len(line)

Answer 1

如果你有一个文件：

f = open("a.txt", "r")
f.readline(size)

size 参数告诉要读取的最大字节数

Answer 2

这会检查没有换行符的数据：

f=open('abc.txt','r')
dodgy=False
if '\n' not in f.read(1024):
    print "Dodgy file - No linefeeds in the first Kb"
    dodgy=True
f.seek(0)
if dodgy==False: #read the first 100 lines
    for x in range(1,101):
        try: line = next(f)
        except Exception as e: break
        print('{0}: {1}'.format(x, line))
else: #read the first n bytes
    line = f.read(1024)
    print('bytes: '+line)
f.close()

使用 readline 限制读取的数量

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-02-05 13:57:34

解决方案2
0 2016-02-05 15:58:31

使用 readline 限制读取的数量

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-02-05 13:57:34

解决方案2 0 2016-02-05 15:58:31

解决方案1
5 已采纳 2016-02-05 13:57:34

解决方案2
0 2016-02-05 15:58:31