简体   繁体   中英

Merge multiple csv files into one

I have roughly 20 csv files (all with headers) that I would like to merge all of them into 1 csv file.

Looking online, one way I found was to use the terminal command:

cat *.csv > file.csv

This worked just fine, but the problem is, as all the csv file comes with the headers, those also get placed into the csv file.

Is there a terminal command or python script on which I can merge all those csv files into one and keep only one header?

Thank you so much

You can do this with awk :

awk '(NR == 1) || (FNR > 1)' *.csv > file.csv

FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. So the first line of the first file is accepted and the first lines of the subsequent files are ignored.

This does assume that all your csv files have the same number of columns in the same order.

My vote goes to the Awk solution, but since this question explicitly asks about Python, here is a solution for that.

import csv
import sys


writer = csv.writer(sys.stdout)

firstfile = True
for file in sys.argv[1:]:
    with open(file, 'r') as rawfile:
        reader = csv.reader(rawfile)
        for idx, row in enumerate(reader):
            # enumerate() is zero-based by default; 0 is first line
            if idx == 0 and not firstfile:
                continue
            writer.writerow(row)
    firstfile = False

Usage: python script.py first.csv second.csv etc.csv >final.csv

This simple script doesn't really benefit from any Python features, but if you need to count the number of fields in non-trivial CSV files (ie with quoted fields which might contain a comma which isn't a separator) that's hard in Awk, and trivial in Python (because the csv library already knows exactly how to handle that).

This command should work for you:

tail -qn +2 *.csv > file.csv

Although, do note, you need to have an extra empty line at the end of each file, otherwise the files will concat in the same line 1, 12, 2 instead of 1, 1 in row 1 and 2, 2 in row 2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM