简体   繁体   中英

What's the most pythonic way to iterate over all the lines of multiple files?

I want to treat many files as if they were all one file. What's the proper pythonic way to take [filenames] => [file objects] => [lines] with generators/not reading an entire file into memory?

We all know the proper way to open a file:

with open("auth.log", "rb") as f:
    print sum(f.readlines())

And we know the correct way to link several iterators/generators into one long one:

>>> list(itertools.chain(range(3), range(3)))
[0, 1, 2, 0, 1, 2]

but how do I link multiple files together and preserve the context managers?

with open("auth.log", "rb") as f0:
    with open("auth.log.1", "rb") as f1:
        for line in itertools.chain(f0, f1):
            do_stuff_with(line)

    # f1 is now closed
# f0 is now closed
# gross

I could ignore the context managers and do something like this, but it doesn't feel right:

files = itertools.chain(*(open(f, "rb") for f in file_names))
for line in files:
    do_stuff_with(line)

Or is this kind of what Async IO - PEP 3156 is for and I'll just have to wait for the elegant syntax later?

There's always fileinput .

for line in fileinput.input(filenames):
    ...

Reading the source however, it appears that fileinput.FileInput can't be used as a context manager 1 . To fix that, you could use contextlib.closing since FileInput instances have a sanely implemented close method:

from contextlib import closing
with closing(fileinput.input(filenames)) as line_iter:
    for line in line_iter:
        ...

An alternative with the context manager, is to write a simple function looping over the files and yielding lines as you go:

def fileinput(files):
    for f in files:
        with open(f,'r') as fin:
            for line in fin:
                yield line

No real need for itertools.chain here IMHO ... The magic here is in the yield statement which is used to transform an ordinary function into a fantastically lazy generator.


1 As an aside, starting with python3.2, fileinput.FileInput is implemented as a context manager which does exactly what we did before with contextlib . Now our example becomes:

# Python 3.2+ version
with fileinput.input(filenames) as line_iter:
    for line in line_iter:
        ...

although the other example will work on python3.2+ as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM