If I have a Python list of CSV files, how do I merge them all into one giant CSV file?

Question

I created a list of files like this:

merge_files = []
for i in range(2, 12):
    merge_files.append(pandas.read_csv(final_user_study_path + "/P" + str(i) + "/DataCollection/data/merge.csv"))

I want to create a giant csv file with all the files from this list.

Is this the most efficient way to do this?

Answer 1

I recommend unix shell. If they have no headers, or only first have a header:

cat file1.csv file2.csv ... fileN.csv > result.csv

If they have headers, you have to cut them off first:

cat file1.csv > result.csv
for i in {1..N}; do tail +2 file$i.csv >> result.csv; done

If files are in different directories - use path to each file:

cat path1/file.csv path2/file.csv > result.csv

Answer 2

The pandas way would be to use concat on the dataframes, this can be useful if you want to do some operations too (as filtering, removing duplicates... etc)

import io
import pandas as pd

Let's create two files

csv1 = "a,b\n1,2"
csv2 = "a,b\n3,4"

file1 = io.StringIO(csv1)
file2 = io.StringIO(csv2)

Loop over them and concat:

pd.concat((pd.read_csv(i) for i in [file1,file2])).to_csv(index=False)

Results in:

'a,b\n1,2\n3,4\n'

Adapted for you in a readable way (my preferred way):

files = []
for i in range(2, 12):
    path = "{}/P{}/DataCollection/data/merge.csv".format(final_user_study_path,i)
    files.append(path)

pd.concat((pd.read_csv(i) for i in files)).to_csv("output.csv",index=False)

If I have a Python list of CSV files, how do I merge them all into one giant CSV file?

Question

2 answers

solution1
2 2017-10-14 20:51:32

solution2
0 2017-10-14 20:55:42

If I have a Python list of CSV files, how do I merge them all into one giant CSV file?

Question

2 answers

solution1 2 2017-10-14 20:51:32

solution2 0 2017-10-14 20:55:42

solution1
2 2017-10-14 20:51:32

solution2
0 2017-10-14 20:55:42