How do I concatenate multiple csv files into a pandas dataframe, with the filenames as the row names?

Question

For Part 1, I have multiple csv files which I loop through to create new csv files with just summary statistics (medians). The new csv files have the original filename + 'summary_' at the start. This part is okay.

For Part 2, I want to concatenate all of the 'summary_' files (they have the same column names as each other), but have the row names in the concatenated dataframe the same as the name of the respective 'summary_' csv file where the data comes from.

With stackoverflow's help, I have solved Part 1, but not Part 2 yet. I can concatenate all of the csv files, but not just the ones with 'summary_' in the name (ie the new csv's created in Part 1), and not with the correct row names...


import os
import pandas as pd
import glob

## Part 1

summary_stats = ['median']

filenames = (filename for filename in os.listdir(os.curdir) if os.path.splitext(filename)[1] == '.csv')

for filename in filenames:
    df = pd.read_csv(filename, )

    summary_df = df.agg(summary_stats)
    summary_df.to_csv(f'summary_{filename}')

## Part 2

path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/*.csv")

list = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    list.append(df)

frame = pd.concat(list, axis=0, ignore_index=True)

Answer 1

Please make sure that the all_files is only loading the files matching "summary_*.csv"
Then, you can append rows from one dataframe to another using df.append()

So your code might look something like this

path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/summary_*.csv")

summary_df = None

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    df['row'] = filename.split('summary_')[1].split('.csv')[0]
    df.set_index('row')

    if summary_df is None:
        summary_df = df
    else:
        summary_df = summary_df.append(df)

Answer 2

Introducing pathlib.Path , an object-oriented interface to paths that will simplify your life.

Keeping your logic, you can use glob directly on Path objects, then combine with pandas.concat() to concatenate dataframes as you load your csv files.

import pandas
from pathlib import Path

srcdir = Path(r'/Users/Desktop/Practice code')              # Get current working directory

df = pd.concat(pd.read_csv(file, index_col=None, header=0)  # Concatenate dataframes from generator
               for file in srcdir.glob('summary_*.csv'),    # Using pathlib.Path's glob
               axis=0, ignore_index=True)                   # Keeping your concat settings

Regarding the row names, you should integrate in your question an extract of your summary_*.csv files and the desired row name.

How do I concatenate multiple csv files into a pandas dataframe, with the filenames as the row names?

Question

2 answers

solution1
0 ACCPTED 2019-03-27 20:31:08

solution2
0 2019-03-27 21:05:17

How do I concatenate multiple csv files into a pandas dataframe, with the filenames as the row names?

Question

2 answers

solution1 0 ACCPTED 2019-03-27 20:31:08

solution2 0 2019-03-27 21:05:17

solution1
0 ACCPTED 2019-03-27 20:31:08

solution2
0 2019-03-27 21:05:17