简体   繁体   中英

How can I use Python to walk through files in directories and output a pandas data frame given certain constraints?

So I'm using Pyhton, and I have a parent directory, with two child directories, in turn containing many directories, each with three files. I want to take the third file (which is a.CSV file) of each of these directories, and parse them together into a pandas dataframe. This is the code I have this far

import os

rootdir ='C:\\Dir\\Dir\\Dir\\root(parent)dir'
# os.listdir(rootdir)
# os.getcwd()

filelist = os.listdir(rootdir)
# file_count = len(filelist)

def list_files(dir):
    r = []
    for root, dirs, files in os.walk(dir):
        # if files.startswith('C74'):
            for name in files:
                r.append(os.path.join(root, name))
    return r

filelist = list_files(rootdir)

Now with "filelist" I get all file paths contained in all directories as strings. Now I need to find: 1. The file names that begin with three specific letters (for example funtest, in this case the first letters being fun) 2. Take every third file, and construct a pandas dataframe from that, so that I can proceed to perform data analysis.

IIUC we can do this much easier using a recursive function from pathlib:

    from pathlib import Path
    csv = [f for f in Path(r'parent_dir').rglob('*C74*.csv')]
    df = pd.concat([pd.read_csv(f) for f in csv])

if you want to subset your list again you could do

subset_list = [x for x in csv if 'abc' in x.stem] 

Test

[x for x in csv if 'abc' in x.stem]
out : ['C74_abc.csv', 'abc_C74.csv']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM