简体   繁体   中英

Using CSV module to append multiple files while removing appended headers

I would like to use the Python CSV module to open a CSV file for appending. Then, from a list of CSV files, I would like to read each csv file and write it to the appended CSV file. My script works great - except that I cannot find a way to remove the headers from all but the first CSV file being read. I am certain that my else block of code is not executing properly. Perhaps my syntax for my if else code is the problem? Any thoughts would be appreciated.

writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
    for files in lstFiles:
        readFile = open(input_file,'rU')
        reader = csv.reader(readFile,dialect='excel')
        for i in range(0,len(lstFiles)):
            if i == 0:
                oldHeader = readFile.readline() 
                newHeader = writeFile.write(oldHeader) 
                for row in reader: 
                    writer.writerow(row)
            else:
                reader.next()
                for row in reader:
                    row = readFile.readlines()
                    writer.writerow(row)
        readFile.close()
writeFile.close() 

You're effectively iterating over lstFiles twice. For each file in your list, you're running your inner for loop up from 0. You want something like:

writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
headers_needed = True
for input_file in lstFiles:
    readFile = open(input_file,'rU')
    reader = csv.reader(readFile,dialect='excel')
    oldHeader = reader.next()
    if headers_needed:
        newHeader = writer.writerow(oldHeader)
        headers_needed = False 
    for row in reader:
        writer.writerow(row)
    readFile.close()
writeFile.close()

You could also use enumerate over the lstFiles to iterate over tuples containing the iteration count and the filename, but I think the boolean shows the logic more clearly.

You probably do not want to mix iterating over the csv reader and directly calling readline on the underlying file.

I think you're iterating too many times (over various things: both your list of files and the files themselves). You've definitely got some consistency problems; it's a little hard to be sure since we can't see your variable initializations. This is what I think you want:

with open(append_file,'a+b') as writeFile:
    need_headers = True
    for input_file in lstFiles:
        with open(input_file,'rU') as readFile:
            headers = readFile.readline()
            if need_headers:
                # Write the headers only if we need them
                writeFile.write(headers)
                need_headers = False
            # Now write the rest of the input file.
            for line in readFile:
                writeFile.write(line)

I took out all the csv-specific stuff since there's no reason to use it for this operation. I also cleaned the code up considerably to make it easier to follow, using the files as context managers and a well-named boolean instead of the "magic" i == 0 check. The result is a much nicer block of code that (hopefully) won't have you jumping through hoops to understand what's going on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM