[英]multiple CSV into one and file name as a column name in Pandas
I have a directory with a hundred the CSV files inside.我有一个目录,里面有一百个 CSV 文件。 One of the CSV looks like this; CSV 之一看起来像这样;
Time ID
09:00 A
.. ..
I want to join all of the csv into one dataframe with including name of file (append by axis=1) I used this code:我想将所有 csv 加入到一个 dataframe 中,其中包括文件名(附加轴 = 1)我使用了以下代码:
files = glob.glob(data/*.csv')
df = pd.concat([pd.read_csv(fp).assign(File=os.path.basename(fp).split('.')[0]) for fp in files], axis=1)
df.to_csv('new.csv')
df
I got a result looks like this我得到的结果看起来像这样
Time ID File Time ID File ..
09:00 A 01 09:00 B 02 ..
.. .. .. .. .. .. ..
I want to join the ID column name with the file name as a column name.我想以文件名作为列名加入 ID 列名。 my expected result looks like this:我的预期结果如下所示:
Time 01_ID Time 02_ID ..
09:00 A 09:00 B ..
.. .. .. .. ..
You can use dictionary comprehension first:您可以先使用字典理解:
comp = {os.path.basename(fp).split('.')[0]: pd.read_csv(fp) for fp in files}
df = pd.concat(comp, axis=1)
And then filter in list comprehension for convert MultiIndex in columns
:然后在列表理解中过滤以MultiIndex in columns
:
df.columns = [f"{a}_{b}" if b == 'ID' else b for a, b in df.columns]
print (df)
Time 01_ID Time 02_ID
0 09:00 A 09:00 B
df.to_csv('new.csv')
EDIT: Better solution is create unique columns names:编辑:更好的解决方案是创建唯一的列名:
df.columns = df.columns.map('_'.join)
print (df)
01_Time 01_ID 02_Time 02_ID
0 09:00 A 09:00 B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.