在给定某些约束的情况下，如何使用 Python 遍历目录中的文件和 output 和 pandas 数据框？

Question

So I'm using Pyhton, and I have a parent directory, with two child directories, in turn containing many directories, each with three files.所以我使用 Pyhton，我有一个父目录，有两个子目录，依次包含许多目录，每个目录有三个文件。 I want to take the third file (which is a.CSV file) of each of these directories, and parse them together into a pandas dataframe.我想获取每个目录的第三个文件（即 .CSV 文件），并将它们一起解析为 pandas dataframe。 This is the code I have this far这是我到目前为止的代码

import os

rootdir ='C:\\Dir\\Dir\\Dir\\root(parent)dir'
# os.listdir(rootdir)
# os.getcwd()

filelist = os.listdir(rootdir)
# file_count = len(filelist)

def list_files(dir):
    r = []
    for root, dirs, files in os.walk(dir):
        # if files.startswith('C74'):
            for name in files:
                r.append(os.path.join(root, name))
    return r

filelist = list_files(rootdir)

Now with "filelist" I get all file paths contained in all directories as strings.现在使用“filelist”，我将所有目录中包含的所有文件路径作为字符串。 Now I need to find: 1. The file names that begin with three specific letters (for example funtest, in this case the first letters being fun) 2. Take every third file, and construct a pandas dataframe from that, so that I can proceed to perform data analysis.现在我需要找到： 1. 以三个特定字母开头的文件名（例如 funtest，在这种情况下第一个字母很有趣） 2. 每隔三个文件构造一个 pandas dataframe ，这样我就可以继续进行数据分析。

Answer 1

IIUC we can do this much easier using a recursive function from pathlib: IIUC 我们可以使用来自 pathlib 的递归 function 更容易地做到这一点：

    from pathlib import Path
    csv = [f for f in Path(r'parent_dir').rglob('*C74*.csv')]
    df = pd.concat([pd.read_csv(f) for f in csv])

if you want to subset your list again you could do如果您想再次对列表进行子集化，您可以这样做

subset_list = [x for x in csv if 'abc' in x.stem]

Test测试

[x for x in csv if 'abc' in x.stem]
out : ['C74_abc.csv', 'abc_C74.csv']

在给定某些约束的情况下，如何使用 Python 遍历目录中的文件和 output 和 pandas 数据框？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-21 16:05:49

Test测试

在给定某些约束的情况下，如何使用 Python 遍历目录中的文件和 output 和 pandas 数据框？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-21 16:05:49

Test测试

解决方案1
1 已采纳 2019-11-21 16:05:49