简体   繁体   中英

File generator to get files from leaf folders ignoring hidden folders

I have a folder structure with some epubs and json files in the down-most folders (not counting the .ts folders). I'm exporting tags from the json files to tagspaces, by creating a .ts folder with other json files. I've already processed part of the files and now I want to find the json files in the leaf folders that don't have a .ts folder in their path, so that I don't have to process the same files twice.

I want to process the files in the directories as I find them instead of getting a list of all the files and then looping through them. Which is why I want to make a generator.

On this example I should be getting the file test/t1/t2/test.json as the result but I'm getting test/t1/test.json instead. Which is wrong because t1 is not a leaf folder.

test
├── t1
│   ├── t2
│   │   └── test.json
│   ├── test.json
│   └── .ts
│       └── test.json
└── .ts
    └── t3
        └── test.json

This is what I've tried:

def file_generator(path: str) -> List[str]:
    for root, subdirs, filenames in os.walk(path):
        # If only hidden folders left, ignore current folder
        if all([d[0] == '.' for d in subdirs]): 
            continue
        # Ignore hidden subfolders
        subdirs[:] = [d for d in subdirs if d[0] != '.']
        # Return files in current folder
        for filename in filenames:
            if filename.endswith('.json'):
                meta_file = os.path.join(root, filename)
                yield meta_file


def test_file_generator():
    try:
        os.makedirs('test/t1/t2', exist_ok=True)
        os.makedirs('test/t1/.ts', exist_ok=True)
        os.makedirs('test/.ts/t3', exist_ok=True)
        Path('test/t1/t2/test.json').touch()
        Path('test/t1/test.json').touch()
        Path('test/t1/.ts/test.json').touch()
        Path('test/.ts/t3/test.json').touch()
        gen = file_generator('test')
        assert tuple(gen) == ('test/t1/t2/test.json',)
    finally:
        shutil.rmtree('test')

So you reversed the condition: you only skip over leaf folders, rather than anything else. And you skip at the wrong time, because if you're not in a leaf folder you'll still want to remove all the hidden folders.

from typing import Iterator

# You don't actually return a list, so I changed it so it typechecks!
def file_generator(path: str) -> Iterator[str]:
    for root, subdirs, filenames in os.walk(path):
        # Ignore hidden subfolders
        subdirs[:] = [d for d in subdirs if d[0] != '.']
        # If any subfolders are left, ignore current folder
        if subdirs: 
            continue
        # Yield files in current folder
        for filename in filenames:
            if filename.endswith('.json'):
                meta_file = os.path.join(root, filename)
                yield meta_file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM