简体   繁体   中英

Limiting amount read using readline

I'm trying to read the first 100 lines of large text files. Simple code for doing this is shown below. The challenge, though, is that I have to guard against the case of corrupt or otherwise screwy files that don't have any line breaks (yes, people somehow figure out ways to generate these). In those cases I'd still like to read in data (because I need to see what's going on in there) but limit it to, say, n bytes.

The only way I can think of to do this is to read the file char by char. Other than being slow (probably not an issue for only 100 lines) I am worried that I'll run into trouble when I encounter a file using non-ASCII encoding.

Is it possible to limit the bytes read using readline()? Or is there a more elegant way to handle this?

line_count = 0
with open(filepath, 'r') as f:
    for line in f:
        line_count += 1
        print('{0}: {1}'.format(line_count, line))
        if line_count == 100:
            break

EDIT:

As @Fredrik correctly pointed out, readline() accepts an arg that limits the number of chars read (I'd thought it was a buffer size param). So, for my purposes, the following works quite well:

max_bytes = 1024*1024
bytes_read = 0

fo = open(filepath, "r")
line = fo.readline(max_bytes)
bytes_read += len(line)
line_count = 0
while line != '':
    line_count += 1
    print('{0}: {1}'.format(line_count, line))
    if (line_count == 100) or (bytes-read >= max_bytes):
        break
    else:
        line = fo.readline(max_bytes - bytes_read)
        bytes_read += len(line)

If you have a file:

f = open("a.txt", "r")
f.readline(size)

The size parameter tells the maximum number of bytes to read

This checks for data with no line breaks:

f=open('abc.txt','r')
dodgy=False
if '\n' not in f.read(1024):
    print "Dodgy file - No linefeeds in the first Kb"
    dodgy=True
f.seek(0)
if dodgy==False: #read the first 100 lines
    for x in range(1,101):
        try: line = next(f)
        except Exception as e: break
        print('{0}: {1}'.format(x, line))
else: #read the first n bytes
    line = f.read(1024)
    print('bytes: '+line)
f.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM