简体   繁体   中英

Python Glob regex file search with for single result from multiple matches

In Python, I am trying to find a specific file in a directory, let's say, 'file3.txt'. The other files in the directory are 'flie1.txt', 'File2.txt', 'file_12.txt', and 'File13.txt'. The number is unique, so I need to search by a user supplied number.

file_num = 3
my_file = glob.glob('C:/Path_to_dir/' + r'[a-zA-Z_]*' + f'{file_num} + '.txt')

Problem is, that returns both 'file3.txt' and 'File13.txt'. If I try lookbehind, I get no files:

file_num = 3
my_file = glob.glob('C:/Path_to_dir/' + r'[a-zA-Z_]*' + r'(?<![1-9]*)' + f'{file_num}' +  '.txt')

How do I only get 'file3.txt'?

glob accepts Unix wildcards, not regexes. Those are less powerful but what you're asking can still be achieved. This:

glob.glob("/path/to/file/*[!0-9]3.txt")

filters the files containing 3 without digits before.

For other cases, you can use a list comprehension and regex:

[x for x in glob.glob("/path/to/file/*") if re.match(some_regex,os.path.basename(x))]

The problem with glob is that it has limited RegEx. For instance, you can't have "[a-z_]+" with glob .

So, it's better to write your own RegEx, like this:

import re
import os

file_num = 3
file_re = r"[a-z_]+{file_num}\.txt".format(file_num=file_num)
match_file = re.compile(file_re, flags=re.IGNORECASE).match

work_dir = "C:/Path_to_dir/"
names = list(filter(match_file, os.listdir(work_dir)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM