How to read multiple files and merge them into a single pandas data frame?

Question

I want to read multiple files located in the same directory and then merge them into a single pandas data frame.

It works if I do it this way:

import pandas as pd

df1 = pd.read_csv("data/12015.csv")
df2 = pd.read_csv("data/22015.csv")
df3 = pd.read_csv("data/32015.csv")

df = pd.concat([df1, df2, df3])

However, I want to use more elegant solution that would especially useful if the amount of files is greater than 3.

I tried this approach, however I don't know how to apply concat inside the for loop.

import pandas as pd
import os
from os import path

files = [x for x in os.listdir("data") if path.isfile("data"+os.sep+x)]

for f in files:
    df = pd.read_csv("data/"+f)

Answer 1

You can use list comprehension to create the list of DataFrames to concat and then call pd.concat() on that list. Example -

import pandas as pd
import os
from os import path
dfs = [pd.read_csv(path.join('data',x)) for x in os.listdir("data") if path.isfile(path.join("data",x))]
df = pd.concat(dfs)

And you should consider using os.path.join() as I have used to create the paths, rather than concatenating the strings yourself.

Answer 2

A simple list comprehension should suffice:

dfs = pd.concat([pd.read_csv("data/" + f) for f in files])

A more fault tolerant approach is as follows:

df_list = []
bad_files = []
for f in files:
    try:
        df_list.append(pd.read_csv("data/" + f))
    except:
        bad_files.append(f)
dfs = pd.concat(df_list)

How to read multiple files and merge them into a single pandas data frame?

Question

2 answers

solution1
5 ACCPTED 2015-10-01 17:39:18

solution2
2 2015-10-01 17:59:56

How to read multiple files and merge them into a single pandas data frame?

Question

2 answers

solution1 5 ACCPTED 2015-10-01 17:39:18

solution2 2 2015-10-01 17:59:56

solution1
5 ACCPTED 2015-10-01 17:39:18

solution2
2 2015-10-01 17:59:56