For Part 1, I have multiple csv files which I loop through to create new csv files with just summary statistics (medians). The new csv files have the original filename + 'summary_' at the start. This part is okay.
For Part 2, I want to concatenate all of the 'summary_' files (they have the same column names as each other), but have the row names in the concatenated dataframe the same as the name of the respective 'summary_' csv file where the data comes from.
With stackoverflow's help, I have solved Part 1, but not Part 2 yet. I can concatenate all of the csv files, but not just the ones with 'summary_' in the name (ie the new csv's created in Part 1), and not with the correct row names...
import os
import pandas as pd
import glob
## Part 1
summary_stats = ['median']
filenames = (filename for filename in os.listdir(os.curdir) if os.path.splitext(filename)[1] == '.csv')
for filename in filenames:
df = pd.read_csv(filename, )
summary_df = df.agg(summary_stats)
summary_df.to_csv(f'summary_{filename}')
## Part 2
path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/*.csv")
list = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
list.append(df)
frame = pd.concat(list, axis=0, ignore_index=True)
Please make sure that the all_files
is only loading the files matching "summary_*.csv"
Then, you can append rows from one dataframe to another using df.append()
So your code might look something like this
path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/summary_*.csv")
summary_df = None
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
df['row'] = filename.split('summary_')[1].split('.csv')[0]
df.set_index('row')
if summary_df is None:
summary_df = df
else:
summary_df = summary_df.append(df)
Introducing pathlib.Path
, an object-oriented interface to paths that will simplify your life.
Keeping your logic, you can use glob
directly on Path
objects, then combine with pandas.concat()
to concatenate dataframes as you load your csv files.
import pandas
from pathlib import Path
srcdir = Path(r'/Users/Desktop/Practice code') # Get current working directory
df = pd.concat(pd.read_csv(file, index_col=None, header=0) # Concatenate dataframes from generator
for file in srcdir.glob('summary_*.csv'), # Using pathlib.Path's glob
axis=0, ignore_index=True) # Keeping your concat settings
Regarding the row names, you should integrate in your question an extract of your summary_*.csv files and the desired row name.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.