简体   繁体   中英

pathlib.Path().glob() and multiple file extension

I need to specify multiple file extensions like pathlib.Path(temp_folder).glob('*.xls', '*.txt'):

How I can do it?

https://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob

If you need to use pathlib.Path.glob()

from pathlib import Path
def get_files(extensions):
    all_files = []
    for ext in extensions:
        all_files.extend(Path('.').glob(ext))
    return all_files

files = get_files(('*.txt', '*.py', '*.cfg'))

You can also use the syntax ** from pathlib which allows you to recursively collect the nested paths.

from pathlib import Path
import re


BASE_DIR = Path('.')
EXTENSIONS = {'.xls', '.txt'}

for path in BASE_DIR.glob(r'**/*'):
    if path.suffix in EXTENSIONS:
        print(path)

If you want to express more logic in your search you can also use a regex as follows:

pattern_sample = re.compile(r'/(([^/]+/)+)(S(\d+)_\d+).(tif|JPG)')

This pattern will look for all images (tif and JPG) that match S327_008(_flipped)?.tif in my case. Specifically it will collect the sample id and the file name.

Collecting into a set prevents storing duplicates, I found it sometimes useful if you insert more logic and want to ignore different versions of the files ( _flipped )

matched_images = set()

for item in BASE_DIR.glob(r'**/*'):
    match = re.match(pattern=pattern_sample, string=str(item))
    if match:
        # retrieve the groups of interest
        filename, sample_id = match.group(3, 4)
        matched_images.add((filename, int(sample_id)))

A bit late to the party with a couple of single-line suggestions that don't require writing a custom function nor the use of a loop and work on Linux:

pathlib.Path.glob() takes interleaved symbols in brackets. For the case of ".txt" and ".xls" suffixes, one could write

files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]')

If you need to search for ".xlsx" as well, just append the wildcard "*" after the last closing bracket.

files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]*')

A thing to keep in mind is that the wildcard at the end will be catching not only the "x", but any trailing characters after the last "t" or "s".

Prepending the search pattern with "**/" will do the recursive search as discussed in previous answers.

A four-liner solution based on Check if string ends with one of the strings from a list :

folder = '.'
suffixes = ('xls', 'txt')
filter_function = lambda x: x.endswith(suffixes)
list(filter(filter_function, glob(os.path.join(folder, '*'))))

Suppose that the following folder structure is prepared.

folder
├── test1.png
├── test1.txt
├── test1.xls
├── test2.png
├── test2.txt
└── test2.xls

The simple answer using pathlib.Path is as follows.

from pathlib import Path

ext = ['.txt', '.xls']
folder = Path('./folder')

# Get a list of pathlib.PosixPath
path_list = sorted(filter(lambda path: path.suffix in ext, folder.glob('*')))
print(path_list)
# [PosixPath('folder/test1.txt'), PosixPath('folder/test1.xls'), PosixPath('folder/test2.txt'), PosixPath('folder/test2.xls')]

If you want to get the path as a list of strings, you can convert it to a string by using .as_posix() .

# Get a list of string paths
path_list = sorted([path.as_posix() for path in filter(lambda path: path.suffix in ext, folder.glob('*'))])
print(path_list)
# ['folder/test1.txt', 'folder/test1.xls', 'folder/test2.txt', 'folder/test2.xls']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM