简体   繁体   中英

If I have a Python list of CSV files, how do I merge them all into one giant CSV file?

I created a list of files like this:

merge_files = []
for i in range(2, 12):
    merge_files.append(pandas.read_csv(final_user_study_path + "/P" + str(i) + "/DataCollection/data/merge.csv"))

I want to create a giant csv file with all the files from this list.

Is this the most efficient way to do this?

I recommend unix shell. If they have no headers, or only first have a header:

cat file1.csv file2.csv ... fileN.csv > result.csv

If they have headers, you have to cut them off first:

cat file1.csv > result.csv
for i in {1..N}; do tail +2 file$i.csv >> result.csv; done

If files are in different directories - use path to each file:

cat path1/file.csv path2/file.csv > result.csv

The pandas way would be to use concat on the dataframes, this can be useful if you want to do some operations too (as filtering, removing duplicates... etc)

import io
import pandas as pd

Let's create two files

csv1 = "a,b\n1,2"
csv2 = "a,b\n3,4"

file1 = io.StringIO(csv1)
file2 = io.StringIO(csv2)

Loop over them and concat:

pd.concat((pd.read_csv(i) for i in [file1,file2])).to_csv(index=False)

Results in:

'a,b\n1,2\n3,4\n'

Adapted for you in a readable way (my preferred way):

files = []
for i in range(2, 12):
    path = "{}/P{}/DataCollection/data/merge.csv".format(final_user_study_path,i)
    files.append(path)

pd.concat((pd.read_csv(i) for i in files)).to_csv("output.csv",index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM