简体   繁体   中英

Combining columns of CSV files of unknown lengths but same width in Python

I have an unknown number of input csv files that look more or less like this (set width various lengths)

Header1, Header2, Header3, Header4
1,2,3,4
11,22,33,44
1,2,3,4

The output looks like this.

Header1,Header3, ,Header1,Header3, ,...
1,3, ,1,3, ,... 
...

Currently I can read all the input files into strings and I know how to read the first line of each file and print it in the desired format, but I am stuck on how to make a loop to go to the next line of each file and print that data. Since the files are of different lengths when one ends I don't know how to handle that and put in blank spaces as place holders to keep the format. Below is my code.

csvs = []
hold = []
i=0         # was i=-1 to start, improved
for files in names:
    i=i+1
    csvs.append([i])
    hold.append([i])

#z=0
for z in range(i):
    # putting csv files into strings
    csvs[z] = csv.reader(open(names[z],'rb'), delimiter=',')

line = []    
#z=0
for z in range(i):
    hold[z]=csvs[z].next()
    line = line + [hold[z][0], hold[z][3], ' ']

print line
writefile.writerow(line)

names is the string that holds the csv file paths. Also I am fairly new to this so if you see some place where I could do things better I am all ears.

Let's assume that you know how to merge lines when some files are longer than others. Here's a way to make iteration over lines and files easier.

from itertools import izip_longest 
# http://docs.python.org/library/itertools.html#itertools.izip_longest

# get a list of open readers using a list comprehension
readers = [csv.reader(open(fname, "r")) for fname in list_of_filenames]

# open writer
output_csv = csv.writer(...)

for bunch_of_lines in izip_longest(*readers, fillvalue=['', '', '', '']):
  # Here bunch_of_lines is a tuple of lines read from each reader,
  # e.g. all first lines, all second lines, etc
  # When one file is past EOF but others aren't, you get fillvalue for its line.
  merged_row = []
  for line in bunch_of_lines:
      # if it's a real line, you have 4 items of data.
      # if the file is past EOF, the line is fillvalue from above
      #   which again is guaranteed to have 4 items of data, all empty strings.
      merged_row.extend([line[1], line[3]]) # put columns 1 and 3
  output_csv.writerow(merged_row)

This code stops only after the longest file is over, and the loop is only 5 lines of code. I think you'll figure headers yourself.

A note: in Python, you need range() and integer-indexed access to lists quite rarely, after you have understood how for loops and list comprehensions work. In Python, for is what foreach is in other languages; it has nothing to do with indices.

This doesn't give the spare commas you showed in your output, but that wouldn't be hard to add by just popping an extra blank field into data each time we append to it:

import csv

names=['test1.csv','test2.csv']
csvs = []
done = []
for name in names:
    csvs.append(csv.reader(open(name, 'rb')))
    done.append(False)

while not all(done):
    data = []
    for i, c in enumerate(csvs):
        if not done[i]:
            try:
                row = c.next()
            except StopIteration:
                done[i] = True
        if done[i]:
            data.append('')
            data.append('')
            # data.append('')  <-- here
        else:
            data.append(row[0])
            data.append(row[3])
            # data.append('')   <-- and here for extra commas
    if not all(done):
        print ','.join(data)

Also, I don't close anything explicitly, which you should do if this were part of a long running process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM