简体   繁体   English

如何在 python 中使用正则表达式获取文件名

[英]how to get file name using regular expression in python

I have a python script that list folders and files in a given path i want to find a pdf file that that start with number and with the current date.我有一个 python 脚本,它列出了给定路径中的文件夹和文件,我想找到一个以数字和当前日期开头的pdf 文件 For this i used regular expression but the problem is that the system does not match the requested file.为此,我使用了正则表达式,但问题是系统与请求的文件不匹配。

regular expression = [0-9]+_[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$

example of file name: 10204_09-03-2021.pdf文件名示例:10204_09-03-2021.pdf

where is the error in my code?我的代码中的错误在哪里?

code:代码:

for file in files:
   if file.endwith("pdf"):
      if file == re.findall("^[0-9]+_[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$",file)
         shutil.copy(filename,destination)

When your file name is 10204_09-03-2021.pdf .当您的文件名为10204_09-03-2021.pdf Your regex pattern won't match it because it's supposed to end with [0-9]{2}$ because of the $.您的正则表达式模式不会匹配它,因为它应该以[0-9]{2}$结尾,因为 $. But your file name will be 10204_09-03-2021.pdf, so it will always end with .pdf .但是您的文件名将是 10204_09-03-2021.pdf,所以它总是以.pdf

Adding a.pdf between and [0-9]{2} and $ will match it.[0-9]{2}和 $ 之间添加 a.pdf 和$将匹配它。 Eg -例如 -

>>> import re
>>> file = "10204_09-03-2021.pdf"
>>> pattern = re.compile(r"[0-9]+_[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}\.pdf$")
>>> pattern.findall(file)
['10204_09-03-2021.pdf']

Also note that in file == re.findall(...) file is a string and re.findall(...) is a list.另请注意,在file == re.findall(...)中,文件是一个字符串,而 re.findall(...) 是一个列表。 So you'll get a False even if it matches.因此,即使匹配,您也会得到 False 。 You can try using re.fullmatch which returns a Match Object (where boolean value is always True) or None if it doesn't match.您可以尝试使用re.fullmatch返回匹配 Object (其中 boolean 值始终为 True)或 None 如果它不匹配。 Or, you can do file in re.findall(...)或者,您可以file in re.findall(...)

You can use您可以使用

import os, re, shutil
destination = r'new/folder/path'
files = ['folder/10204_09-03-2021.pdf', 'folder/04_09-03-2021.pdf']
rx = re.compile(r'^[0-9]+_[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}\.pdf$', re.I)
for file in files:
    if rx.match(os.path.basename(file)):
        #shutil.copy(filename,destination)
        print(file) # Just print here
# => folder/10204_09-03-2021.pdf, folder/04_09-03-2021.pdf

See the Python demo and the regex demo .请参阅Python 演示正则表达式演示

NOTE : The regex is applied only to the basename of the file path (the file name itself, with extension).注意:正则表达式仅适用于文件路径的基本名称(文件名本身,带有扩展名)。 Thus, the \.pdf is added to the regex pattern, and ^ / $ anchors make sure the entire string must match the regex.因此, \.pdf被添加到正则表达式模式中,并且^ / $锚确保整个字符串必须与正则表达式匹配。 Using re.match here yields the same results as re.search or re.fullmatch .在这里使用re.match产生与re.searchre.fullmatch相同的结果。

Regex details :正则表达式详细信息

  • ^ - start of string ^ - 字符串的开头
  • [0-9]+ - one or more digits [0-9]+ - 一位或多位数字
  • _ - a _ char _ - 一个_字符
  • [0-3]?[0-9] - an optional digit from 0 to 3 and then any one digit [0-3]?[0-9] - 从03的可选数字,然后是任意一位数字
  • - - a hyphen - - 一个连字符
  • [0-3]?[0-9]- - an optional digit from 0 to 3 and then any one digit and then a hyphen [0-3]?[0-9]- - 从03的可选数字,然后是任意一位数字,然后是连字符
  • (?:[0-9]{2})? - an optional sequence of two digits - 可选的两位数字序列
  • [0-9]{2} - two digits [0-9]{2} - 两位数
  • \.pdf - .pdf string \.pdf - .pdf字符串
  • $ - end of string. $ - 字符串结束。

The regex is case insensitive due to re.I flag.由于re.I标志,正则表达式不区分大小写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM