I know the generally difference between readlines and readline of file object. But Im more curious how their performance differ from each other, thus I made a test here.
import timeit
with open('test.txt', 'w') as f:
f.writelines('\n'.join("Just a test case\tJust a test case2\tJust a test case3" for i in range(1000000)))
def a1():
with open('test.txt', 'r') as f:
for text in f.readlines():
pass
def a2():
with open('test.txt', 'r') as f:
text = f.readline()
while text:
text = f.readline()
print(timeit.timeit(a1, number =100))
print(timeit.timeit(a2, number =100))
$python readline_vs_readlines.py
38.410646996984724
35.876863296027295
But Why is that? I think io is more time consuming so if you read more times instead of read it into the memory in once, it takes more time. So from what im seeing here, why we use readlines
anyway? It costs us enormous amount of memory if file is large with no gain on speed?
Actually even slower when readline
was used in for
loop:
import timeit
with open('test.txt', 'w') as fp:
print(*("Just a test case" for i in range(1000000)), sep='\n', file=fp)
def a1():
with open('test.txt', 'r') as f:
for _ in f.readlines():
pass
def a2():
with open('test.txt', 'r') as f:
while _ := f.readline():
pass
def a3():
with open('test.txt', 'r') as f:
for _ in iter(f.readline, ''):
pass
print(timeit.timeit(a1, number=50))
print(timeit.timeit(a2, number=50))
print(timeit.timeit(a3, number=50))
output:
10.9471131
10.282239
9.3618919
When comparing on same for
loop, Clearly a3
way is faster than a1
, although it's aganist Zen of python.
Reason for this lies in source code _pyio.py
and iobase.c
:
When iobase.c
is unavailable pure-python _pyio.py
will be used.
def readlines(self, hint=None):
"""Return a list of lines from the stream.
hint can be specified to control the number of lines read: no more
lines will be read if the total size (in bytes/characters) of all
lines so far exceeds hint.
"""
if hint is None or hint <= 0:
return list(self)
n = 0
lines = []
for line in self:
lines.append(line)
n += len(line)
if n >= hint:
break
return lines
It's appending each line it reads - which share same mechanics for readline
- it's not loading up entire file at all.
This is also same for C implementation iobase.c
:
while (1) {
Py_ssize_t line_length;
PyObject *line = PyIter_Next(it);
if (line == NULL) {
if (PyErr_Occurred()) {
goto error;
}
else
break; /* StopIteration raised */
}
if (PyList_Append(result, line) < 0) {
Py_DECREF(line);
goto error;
}
line_length = PyObject_Size(line);
Py_DECREF(line);
if (line_length < 0) {
goto error;
}
if (line_length > hint - length)
break;
length += line_length;
}
As you see it's calling PyList_Append
to append results to list.
PS Just a reminder, joining is half bad as concatenating for this many strings, use print
with sep
, file
parameter is recommended. Do not join strings where it's not needed.
Readlines
reads all the text into memory before starting the loop, while readline
just reads a buffer at a time, automatically, while looping. Here's a better explanation of the memory comparison between the two.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.