简体   繁体   English

使用 Os 和 Glob 搜索和连接.csv 文件和 Pandas 创建 DataFrame

[英]Using Os and Glob to Search and Concatenate .csv Files and Pandas to Create DataFrame

Problem问题

I have multliple directories each with subdirectories.我有多个目录,每个目录都有子目录。 These subdirectories contain.csv files with numerical data in them.这些子目录包含 .csv 文件,其中包含数字数据。 I want to us glob and os (not shell scripts) to search two specified directories and then locate specific folders and concatenate them in a format I will describe below.我想让我们 glob 和 os(不是 shell 脚本)搜索两个指定的目录,然后找到特定的文件夹并以我将在下面描述的格式连接它们。

dir1 contains subdir1 contains A.csv 
     contains subdir2 contains B.csv

dir2 contains subdir1 contains A.csv
     contains subdir2 contains B.csv

IN BOTH CASES在这两种情况下

>>> cat A.csv
1
2
3
4
5
>>> cat B.csv
6
7
8
9
10

MY DESIRED BEHAVIOUR我想要的行为

Find A.csv in dir1 and find A.csv in dir2, searching every folder and directory, and then merge them.在dir1中找到A.csv,在dir2中找到A.csv,搜索每个文件夹和目录,然后合并。 After merge, create pandas.DataFrame合并后,创建 pandas.DataFrame

>>> python3 merge.py dir1 dir2 A.csv
# prints df created from out.csv
   x   y
0  1   1 
1  2   2 
2  3   3
3  4   4
4  5   5
>>> cat out.csv
1
2
3
4
5
1
2
3
4
5

ASK QUESTIONS IF NEEDED需要时提出问题

You can use os.walk to walk through directories and glob.glob to search for *.csv files like so:您可以使用os.walk遍历目录并使用glob.glob搜索 *.csv 文件,如下所示:

from os import walk
from os.path import join
from glob import glob
root_dir = '/some/path/to_a_directory/'
for rootdir, _, _ in walk(root_dir):
    all_csv = glob(join(root_dir, '*.csv'))
    for fpath in all_csv:
        # Open the file and do something with it

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM