[英]Using Os and Glob to Search and Concatenate .csv Files and Pandas to Create DataFrame
Problem问题
I have multliple directories each with subdirectories.我有多个目录,每个目录都有子目录。 These subdirectories contain.csv files with numerical data in them.
这些子目录包含 .csv 文件,其中包含数字数据。 I want to us glob and os (not shell scripts) to search two specified directories and then locate specific folders and concatenate them in a format I will describe below.
我想让我们 glob 和 os(不是 shell 脚本)搜索两个指定的目录,然后找到特定的文件夹并以我将在下面描述的格式连接它们。
dir1 contains subdir1 contains A.csv
contains subdir2 contains B.csv
dir2 contains subdir1 contains A.csv
contains subdir2 contains B.csv
IN BOTH CASES在这两种情况下
>>> cat A.csv
1
2
3
4
5
>>> cat B.csv
6
7
8
9
10
MY DESIRED BEHAVIOUR我想要的行为
Find A.csv in dir1 and find A.csv in dir2, searching every folder and directory, and then merge them.在dir1中找到A.csv,在dir2中找到A.csv,搜索每个文件夹和目录,然后合并。 After merge, create pandas.DataFrame
合并后,创建 pandas.DataFrame
>>> python3 merge.py dir1 dir2 A.csv
# prints df created from out.csv
x y
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
>>> cat out.csv
1
2
3
4
5
1
2
3
4
5
ASK QUESTIONS IF NEEDED需要时提出问题
You can use os.walk
to walk through directories and glob.glob
to search for *.csv files like so:您可以使用
os.walk
遍历目录并使用glob.glob
搜索 *.csv 文件,如下所示:
from os import walk
from os.path import join
from glob import glob
root_dir = '/some/path/to_a_directory/'
for rootdir, _, _ in walk(root_dir):
all_csv = glob(join(root_dir, '*.csv'))
for fpath in all_csv:
# Open the file and do something with it
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.