简体   繁体   English

使用文件名列将 CSV 连接到 dataframe

[英]Concatenating CSVs into dataframe with filename column

I am trying to concat multiple CSVs that live in subfolders of my parent directory into a data frame, while also adding a new filename column.我正在尝试将位于我父目录的子文件夹中的多个 CSV 连接到一个数据框中,同时还添加一个新的文件名列。

/ParentDirectory
│  
│
├───SubFolder 1
│       test1.csv
│
├───SubFolder 2
│       test2.csv
│
├───SubFolder 3
│       test3.csv
│       test4.csv
│
├───SubFolder 4
│       test5.csv

I can do something like this to concat all the CSVs into a single data frame我可以做这样的事情来将所有 CSV 连接到一个数据框中

import pandas as pd
import glob

files = glob.glob('/ParentDirectory/**/*.csv', recursive=True)
df = pd.concat([pd.read_csv(fp) for fp in files], ignore_index=True)

But is there a way to also add the filename of each file as a column to the final data frame, or do I have to loop through each individual file first before concatenating the final data frame?但是有没有办法将每个文件的文件名作为一列添加到最终数据框,或者我是否必须在连接最终数据框之前先遍历每个单独的文件? Output should look like: Output 应如下所示:

   Col1  Col2    file_name
0  AAAA   XYZ    test1.csv
1  BBBB   XYZ    test1.csv
2  CCCC   RST    test1.csv
3  DDDD   XYZ    test2.csv
4  AAAA   WXY    test3.csv
5  CCCC   RST    test4.csv
6  DDDD   XTZ    test4.csv
7  AAAA   TTT    test4.csv
8  CCCC   RRR    test4.csv
9  AAAA   QQQ    test4.csv

you can assign the file_names on the fly:您可以即时分配文件名:

from pathlib import Path

df = pd.concat([pd.read_csv(fp).assign(file_name=Path(fp).name)
                for fp in files], ignore_index=True)

where pathlib.Path helps to extract the basename of the file from the path.其中 pathlib.Path 有助于从路径中提取文件的基本名称。

A possible solution (you may need to replace / in the code below by the appropriate slash for your operating system):一个可能的解决方案(您可能需要用适合您操作系统的斜杠替换下面代码中的/ ):

df = pd.concat([pd.read_csv(fp).assign(file_name=str.rsplit(
    fp, '/', 1)[-1]) for fp in files], ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM