简体   繁体   中英

Output folder name to a column in dataframe?

source_files = sorted(Path('path/folder/subfolder1').glob('**/*.csv'))

dataframes = []
for file in source_files:
    df = pd.read_csv(file, names=['date', 'cost', 'percent'])
    df['Instance Number'] = file.name[:-4]
    df['Source'] = folder.name
    dataframes.append(df)

all = pd.concat(dataframes)

all.to_csv('output.csv',index=False)

I'm getting an error for df['Source'] = folder.name because folder isn't defined. How do I add the name of the folder the files come from to a column? I added the name of the file successfully, now I just need to add the name of the folder.

Try this:

os.path.basename(os.path.dirname(file))

os.path.dirname(file) returns the directory name of the file. os.path.basename returns a string value which represents the base name of the specified path.

I see you are using the pathlib library. You can get the folder name where the file is stored by using the parent method:

df['Source'] = file.parent

This should work!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM