简体   繁体   中英

Listing files and folders recursively in Python

Having a tree structure as follows:

custom_test/
├── 110/
│   ├── 1548785454_CO_[1].txt
├── 120/
│   ├── 1628785454_C4_[1].txt
└── 13031/
│   ├── 1544725454_C2_[1].txt
└── test_results/
│   ├── resulset1.txt
│   ├── hey.txt
script.py <------- this is the script which runs the Python code

I want to get the files and subfolder of all folders except test_results (I want to ingnore this folder). Using the minified example above, my desired output is:

['110\\1548785454_CO_[1].txt', '120\\1628785454_C4_[1].txt', '13031\\1544725454_C2_[1].txt']

This is my try, which makes the output, but it includes also the ones of the test_results folder:

deploy_test_path = "custom_test"
    print([os.path.join(os.path.basename(os.path.relpath(os.path.join(filename, os.pardir))), os.path.basename(filename)) for filename in glob.iglob(deploy_test_path + '**/**', recursive=True) if os.path.isfile(filename)])

Without list comprehension (for easier understanding):

deploy_test_path = "custom_test"
for filename in glob.iglob(deploy_test_path + '**/**', recursive=True):
    if os.path.isfile(filename):
        a = os.path.join(os.path.basename(os.path.relpath(os.path.join(filename, os.pardir))), os.path.basename(filename))
        print(a)

How can I archive my goal? I know I can do it removing the elements of test_results from the array, but is there any more elegant and pythonic wait to do this?

Thanks in advance

Anytime I need to manipulate paths, I turn to Pathlib .

Here is how I would do it, more or less:

from pathlib import Path

dir = Path("custom_test")
files = dir.rglob("*")
res = [f.relative_to(dir) for f in files if not f.match("test_results/*")]

In a one-liner:

from pathlib import Path

res = [f.relative_to("custom_test") for f in Path("custom_test").rglob("*") if not f.match("test_results/*")]

If you only need the files, you can use rglob("*.*") instead, or

dir = Path("custom_test")
res = [f.relative_to(dir) for f in dir.rglob("*") if not f.match("test_results/*") and f.is_file()]

I had the same situation and did the following:

import os

IGNORE_FOLDERS = ("test_results",".git")` #as many folders as you need to ignore


    def get_data():
        root, dirnames, filenames = next(os.walk(file_path))
        for dirname in (d for d in dirnames if d not in IGNORE_FOLDERS):
            print(filenames) # or save them to a variable if you like

    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM