简体   繁体   中英

Concatenate a large number of files line for line in python

thanks for lending your eyes here.

I'm processing some spectral data that is in the form of several hundred text files (1.txt,2.txt,3.txt ...) and they are all formatted with the exact same number of lines like this: For clarity:

1.txt:             2.txt:            3.txt:
1,5                1,4               1,7
2,8                2,9               2,14
3,10               3,2               3,5
4,13               4,17              4,9
<...>              <...>             <...>
4096,1             4096,7            4096,18

I'm attempting to concatenate them line-by-line so at I walk away with one output file like:

5,4,7
8,9,14
10,2,5
13,17,9
<...>
1,7,18

I'm very new to Python, and I'd really appreciate some help here. I've attempted this mess:

howmanyfiles=8
output=open('output.txt','w+')
for j in range(howmanyfiles):
    fp=open(str(j+1) + '.txt','r')
    if j==0:
        for i, line in enumerate(fp):
            splitline=line.split(",")
            output.write(splitline[1])
    else:
        output.close()
        output=open('output.txt','r+')
        for i, line in enumerate(fp):
            splitline=line.split(",")
            output.write(output.readline(i)[:-1]+","+splitline[1])
    fp.close()
output.close()

My line of thinking in the above is that I need to place the cursor back at the beginning of the document for each file.. but it's really blowing up in my face.

Thanks dearly.

-matt

I think you can get a lot of mileage out of the zip built-in function, which will let you iterate over all the input files at the same time:

from contextlib import ExitStack

num_files = 8
with open("output.txt", "w") as output, ExitStack() as stack:
    files = [stack.enter_context(open("{}.txt".format(i+1)))
             for i in range(num_files)]
    for lines in zip(*files): # lines is a tuple with one line from each file
        new_line = ",".join(line.partition(',')[2] for line in lines) + "\n"
        file.write(new_line)

Here's a fun way to do it with generators:

import sys

files     = sys.argv[1:]
handles   = (open(f) for f in files)
readers   = ((line.strip() for line in h) for h in handles)
splitters = ((line.split(',')[1] for line in r) for r in readers)
joiners   = (",".join(tuple(s)) for s in splitters)

for j in joiners:
    print j

You might also look into the Unix paste command

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM