如何在python中使用正则表达式找到文件所需的模式？

Question

I tried to match the pattern of a file in my folders the file extension is a pdf .我试图匹配我文件夹中文件的模式，文件扩展名是pdf 。

I have many pdf files that have the same pattern but with different name at the end.我有许多 pdf 文件，它们具有相同的模式，但最后的名称不同。

the pattern includes date + name of the file.该模式包括文件的日期+名称。

The problem is that when I run the script the system consider the both file name as the first pattern ( python_pt ) and do not go for the elif statement.问题是，当我运行脚本时，系统将这两个文件名都视为第一个模式（ python_pt ）并且不使用elif语句。

Example:例子：

10-11-2021 python.pdf
22-09-2021 java.pdf

Code:代码：

import re 
import  os 
from os import path 
from tqdm import tqdm
from time import sleep 

python_pt= "^[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$ python.pdf"
java_pt1= "^[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$ java.pdf"
java_pt2= "^ java [0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$.pdf"
str = 'c:'
a = 0
i = 0
for dirpath, dirnames, files in os.walk(src, topdown=True):         
    print(f'\nFound directory: {dirpath}\n')
    
    for  file in tqdm(files):
        sleep(.1)
        full_file_name = os.path.join(dirpath, file)
        if os.path.join(dirpath) == src:
            if file.endswith("pdf"):
                if python_pt:
                    i+=1
                elif java_pt1 or java_pt2:
                    a+=1
print("{} file 1 \n".format(i))
print("{} file 2 \n".format(a))

Answer 1

The problems are with your regular expressions and the way you perform a regex check:问题在于您的正则表达式以及您执行正则表达式检查的方式：

The anchors must not be used randomly inside the pattern;锚点不得在图案内随意使用； $ renders the pattern invalid once you use it in the middle (there can be no chars after end of string). $在中间使用后会使模式无效（字符串结尾后不能有字符）。 As you need to check if file names end with your pattern, add $ at the end only, and do not forget to escape literal .由于您需要检查文件名是否以您的模式结尾，因此只需在末尾添加$ ，并且不要忘记转义文字.
To check if there is a match you need to use one of the re.search / re.match / re.fullmatch methods.要检查是否存在匹配，您需要使用re.search / re.match / re.fullmatch方法之一。

Here is a fixed snippet:这是一个固定的片段：

import re, os
from os import path 
from tqdm import tqdm
from time import sleep 

python_pt= r"[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2} python\.pdf$" # FIXED
java_pt1= r"[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2} java\.pdf$"    # FIXED
java_pt2= r"java [0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}\.pdf$"    # FIXED

src = "C:"
i=0
a=0

for dirpath, dirnames, files in os.walk(src, topdown=True):         
    print(f'\nFound directory: {dirpath}\n')
    
    for  file in tqdm(files):
        sleep(.1)
        full_file_name = os.path.join(dirpath, file)
        if os.path.join(dirpath) == src:
            if file.endswith("pdf"):
                if re.search(python_pt, file):                               # FIXED
                    i+=1
                elif re.search(java_pt1, file) or re.search(java_pt2, file): # FIXED
                    a+=1
print("{} file 1 \n".format(i))
print("{} file 2 \n".format(a))

See the # FIXED lines.请参阅# FIXED行。

如何在python中使用正则表达式找到文件所需的模式？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-11-10 10:24:00

如何在python中使用正则表达式找到文件所需的模式？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-11-10 10:24:00

解决方案1
0 已采纳 2021-11-10 10:24:00