简体   繁体   中英

What is the most Pythonic way to interleave text file contents?

Python question:

If I have a list of files, how do I print line #1 from each file, then line #2, etc.? (I'm a Python newbie, obviously...)

Example:

file1:
foo1
bar1

file2:
foo2
bar2

file3:
foo3
bar3

Function call:

names = ["file1", "file2", "file3"]
myfct(names)

Desired output:

foo1
foo2
foo3

bar1
bar2
bar3

This is how I did it, but I'm sure there is a more elegant, Pythonic way:

def myfct(files):
    file_handlers = []
    for myfile in files:
        file_handlers.append(open(myfile))
    while True:
        done = False
        for handler in file_handlers:
            line = handler.readline()
            eof = len(line) == 0 # wrong
            if (eof):
                done = True
                break
            print(line, end = "")
        print()
        if done == True:
            break

PS: I'm using Python 2.6 with from __future__ import print_function .

for lines in itertools.izip(*file_handlers):
  sys.stdout.write(''.join(lines))
> cat foo
foo 1
foo 2
foo 3
foo 4
> cat bar
bar 1
bar 2
> cat interleave.py 
from itertools import izip_longest
from contextlib import nested

with nested(open('foo'), open('bar')) as (foo, bar):
    for line in (line for pair in izip_longest(foo, bar)
                      for line in pair if line):
        print line.strip()
> python interleave.py 
foo 1
bar 1
foo 2
bar 2
foo 3
foo 4

compared to other answers here:

  • files are closed on exit
  • izip_longest doesn't stop when one file stops
  • efficient use of memory

or, for multiple files ( filenames is a list of files):

with nested(*(open(file) for file in filenames)) as handles:
    for line in (line for tuple in izip_longest(*handles)
                      for line in tuple if line):
        print line.strip()

If all your files have the same number of lines, or if you want to stop as soon as any file is exhausted, Ignacio's answer is perfect. If you want to support files of different lengths, though, you should use the "round robin" recipe from the itertools documentation:

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).next for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

sys.stdout.writelines(roundrobin(*file_handlers))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM