简体   繁体   中英

Python readline() and readlines() not working

I'm trying to read the contents of a 5GB file and then sort them and find duplicates. The file is basically just a list of numbers (each on a new line). There are no empty lines or any symbols other than digits. The numbers are all pretty big (at least 6 digits). I am currently using

for line in f:
    do something to line

to avoid memory problems. I am fine with using that. However, I am interested to know why readline() and readlines() didn't work for me. When I try

print f.readline(10)

the program always returns the same line no matter which number I use as a parameter. To be precise, if I do readline(0) it returns an empty line, even though the first line in the file is a big number. If I try readline(1) it returns 2, even though the number 2 is not in the file. When the parameter is >= 6, it always returns the same number: 291965.

Additionally, the readlines() method always returns the same lines no matter what the parameter is. Even if I try to print f.readlines(2), it's still giving me a list of over 1000 numbers.

I am not sure if I explained it very well. Sorry, English is not my first language. Anyway, I can make it work without the readline methods but I really want to know why they don't work as expected.

This is what the first 10 lines of the file look like:

548098
968516
853181
485102
69638
689242
319040
610615
936181
486052

I can not reproduce f.readline(1) returning 2 , or f.readlines(10) returning "thousands of lines", but it seems like you misunderstood what the integer parameters to those functions do.

Those number do not specify the number of the line to read, but the maximum bytes readline will read.

>>> f = open("data.txt")
>>> f.readline(1)
'5'
>>>f.readline(100)
'48098\n'

Both commands will read from the first line, which is 548098 ; the first will only read 1 byte, and the second command reads the rest of the line, as there are less than 100 bytes left. If you call readline again, it will continue with the second line, etc.

Similarly, f.readlines(10) will read full lines until the total amount of bytes read is larger than the specified number:

>>> f.readlines(10)
['968516\n', '853181\n']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM