简体   繁体   中英

Adding a column to multiple .csv files with the file name as you combine those .csv files into a single dataframe

I have 50.csv files with over 188k rows combined that I would need to add the file name to so that I am able to track which file it came from. I have included the code I am using below which is able to combine the files into a single df.

df = pd.DataFrame()
for file in files:
    if file.endswith('.csv'):
        df=df.append(pd.read_csv(file), ignore_index=True)
df.head()

You're almost there. Instead of appending directly the result of the read_csv() , store it and add a new column with the file name

for file in files:
    if file.endswith('.csv'):
        df_new = pd.read_csv(file)
        df_new['from_file'] = file
        df = df.append(df_new, ignore_index=True)

Also if your file variable is actually the whole path to the file, you can use os.path.basename(file) which return the name of the file only, without the path.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM