简体   繁体   中英

In Python, is read() , or readlines() faster?

I want to read a huge file in my code. Is read() or readline() faster for this. How about the loop:

for line in fileHandle

For a text file just iterating over it with a for loop is almost always the way to go. Never mind about speed, it is the cleanest.

In some versions of python readline() really does just read a single line while the for loop reads large chunks and splits them up into lines so it may be faster. I think that more recent versions of Python use buffering also for readline() so the performance difference will be minuscule ( for is probably still microscopically faster because it avoids a method call). However choosing one over the other for performance reasons is probably premature optimisation.

Edit to add: I just checked back through some Python release notes. Python 2.5 said:

It's now illegal to mix iterating over a file with for line in file and calling the file object's read()/readline()/readlines() methods.

Python 2.6 introduced TextIOBase which supports both iterating and readline() simultaneously.

Python 2.7 fixed interleaving read() and readline() .

If file is huge, read() is definitevely bad idea, as it loads (without size parameter), whole file into memory.

Readline reads only one line at time, so I would say that is better choice for huge files.

And just iterating over file object should be as effective as using readline.

See http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects for more info

The docs for readlines indicate there is an optional sizehint. Because it is so vague, it's easy to overlook, but I found this to often be the fastest way to read files. Use readlines(1), which hints one line, but in fact reads in about 4k or 8k worth of lines IIRC. This takes advantage of the OS buffering and reduces the number of calls somewhat without using an excessive amount of memory.

You can experiment with different sizes of the sizehint, but I found 1 to be optimal on my platform when I was testing this

read() basically is trying to read the whole file and save it into a single string to be used later while readlines() is also trying to read the whole file but it will do a split("\\n") and store the strings of lines into a list. Hence, these two methods are not preferred if the file size is excessively big.

readline() and for loop (iefor line in file:) will read one line at a time and store it into a string. I guess they will use the same time to finish the job if memory allows. However these two are preferred if the file size is huge.

If you have enough memory use readline if performance is a concern. I have seen that while using a gzip file doing: read().split('\\n') took 5 seconds to loop through, whereas using the iterator took 38 seconds. The size of GZ file was around 45 MB.

The real difference between read() and readlines() The read function simply loads the file as is into memory. The readlines method reads the file as a list of lines without line termination. The readlines method should only be used on text files, and neither should be used on large files. If copying the information from a text file, read works well, because it can be output with a the write function without the need to add line termination.

If your file is a text file then use readlines() which is obviously the way to read file containing lines. Apart from that: perform benchmarks if you are really aware of possible performance problems. I doubt that you will encounter any issues....the speed of the filesystem should be the limiting factor.

Neither. Both of them will read the content into memory. In case of big files, iterating over the file object only loads one line of your file at a time and is perhaps a good way to deal with the contents of a huge file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM