简体   繁体   English

使用 readline 限制读取的数量

[英]Limiting amount read using readline

I'm trying to read the first 100 lines of large text files.我正在尝试读取大文本文件的前 100 行。 Simple code for doing this is shown below.执行此操作的简单代码如下所示。 The challenge, though, is that I have to guard against the case of corrupt or otherwise screwy files that don't have any line breaks (yes, people somehow figure out ways to generate these).但是,挑战在于我必须防范没有任何换行符的损坏或其他扭曲文件的情况(是的,人们以某种​​方式想出了生成这些文件的方法)。 In those cases I'd still like to read in data (because I need to see what's going on in there) but limit it to, say, n bytes.在这些情况下,我仍然想读入数据(因为我需要查看那里发生了什么),但将其限制为 n 个字节。

The only way I can think of to do this is to read the file char by char.我能想到的唯一方法是逐个字符读取文件。 Other than being slow (probably not an issue for only 100 lines) I am worried that I'll run into trouble when I encounter a file using non-ASCII encoding.除了速度慢(可能只有 100 行不是问题)我担心当我遇到使用非 ASCII 编码的文件时会遇到麻烦。

Is it possible to limit the bytes read using readline()?是否可以限制使用 readline() 读取的字节数? Or is there a more elegant way to handle this?或者有没有更优雅的方法来处理这个问题?

line_count = 0
with open(filepath, 'r') as f:
    for line in f:
        line_count += 1
        print('{0}: {1}'.format(line_count, line))
        if line_count == 100:
            break

EDIT:编辑:

As @Fredrik correctly pointed out, readline() accepts an arg that limits the number of chars read (I'd thought it was a buffer size param).正如@Fredrik 正确指出的那样, readline() 接受一个限制读取字符数的参数(我认为这是一个缓冲区大小参数)。 So, for my purposes, the following works quite well:因此,就我的目的而言,以下内容非常有效:

max_bytes = 1024*1024
bytes_read = 0

fo = open(filepath, "r")
line = fo.readline(max_bytes)
bytes_read += len(line)
line_count = 0
while line != '':
    line_count += 1
    print('{0}: {1}'.format(line_count, line))
    if (line_count == 100) or (bytes-read >= max_bytes):
        break
    else:
        line = fo.readline(max_bytes - bytes_read)
        bytes_read += len(line)

If you have a file:如果你有一个文件:

f = open("a.txt", "r")
f.readline(size)

The size parameter tells the maximum number of bytes to read size 参数告诉要读取的最大字节数

This checks for data with no line breaks:这会检查没有换行符的数据:

f=open('abc.txt','r')
dodgy=False
if '\n' not in f.read(1024):
    print "Dodgy file - No linefeeds in the first Kb"
    dodgy=True
f.seek(0)
if dodgy==False: #read the first 100 lines
    for x in range(1,101):
        try: line = next(f)
        except Exception as e: break
        print('{0}: {1}'.format(x, line))
else: #read the first n bytes
    line = f.read(1024)
    print('bytes: '+line)
f.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM