简体   繁体   中英

Merging all csvs in a folder and adding a new column with filename of original file in Python

I am trying to merge all the csv files in a folder into one large csv file. I also need to add a new column to this merged csv that shows the original file that each row came from. This is the code I have so far:

import csv
import glob


read_files = glob.glob("*.csv")

source = []

with open("combined.files.csv", "wb") as outfile:
    for f in read_files:
        source.append(f)
        with open(f, "rb") as infile:
            outfile.write(infile.read())

I know I have to somehow repeat each f for as many rows as are in each csv and then append that as a new column to the .write command, but I am not sure how to do this. Thank you everyone!

If you add the filename as the final column, you don't need to parse the csv at all. Just read them line by line, add filename and write. And don't open in binary mode!

import glob
import os

out_filename = "combined.files.csv"
if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

If your csv's have a common header line, pick one to write to the outfile and supress the rest

import os
import glob

want_header = True
out_filename = "combined.files.csv"

if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")

with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            if want_header:
                outfile.write('{},Filename\n'.format(next(infile).strip()))
                want_header = False
            else:
                next(infile)
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM