简体   繁体   English

正则表达式查找带有下划线和可选扩展名的文件

[英]Regular expression to find files with underscores and optional extension

This is for work, so I've changed the extensions and files to protect the innocent.这是为了工作,所以我更改了扩展名和文件以保护无辜者。

I am parsing text from a description looking for a file name in the format word_here and it can have as many underscores as needed plus an optional extension.我正在从描述中解析文本,以查找格式为 word_here 的文件名,它可以根据需要包含任意数量的下划线以及可选的扩展名。 I was able to come up with this regular expression which works我能够想出这个有效的正则表达式

Test 1测试 1

text = 'Some text here: * my_file_stuff.mat * other_file * third_file *'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 1 Output 1

python test_regex.py

['my_file_stuff.mat', 'other_file', 'third_file']

The problem is it doesn't work for stuff like this问题是它不适用于这样的东西

Test 2测试 2

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 2 Output 2

python test_regex.py
['my_file|a', 'nother_file.mat|', 'O_HERES_ONE|', '_O_HERES_ANOTHER|']

I modified my regex to include the vertical bar, here我修改了我的正则表达式以包含竖线,这里

Test 3测试 3

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|plot]*)\|'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 3 Output 3

 python test_regex.py
['my_file', 'another_file.mat', 'O_HERES_ONE', 'O_HERES_ANOTHER']

and that works for the second one but now not for the first one.这适用于第二个,但现在不适用于第一个。 Part of the issue is I will be searching a description for text to look up where a file is at, and I have no way of knowing what formatting it will use for files, only that they will be something in the form of MY_FILE_HERE01.py with or without the extension.部分问题是我将搜索文本描述以查找文件所在的位置,并且我无法知道它将用于文件的格式,只是它们将以 MY_FILE_HERE01.py 的形式出现有或没有扩展名。

I've tried using the not symbol to exclude the vertical bars in front and back, but that seems to come up empty for both strings.我尝试使用 not 符号来排除前后的竖线,但这两个字符串似乎都是空的。

You may use this regex for both kind of inputs:您可以将此正则表达式用于两种输入:

[a-zA-Z\d]+_\w+(?:\.(?:py|mat))?

RegEx Demo正则表达式演示

RegEx Details:正则表达式详细信息:

  • [a-zA-Z\d]+ : Match 1+ of letters or digits [a-zA-Z\d]+ :匹配 1+ 个字母或数字
  • _ : Match an underscore _ : 匹配下划线
  • \w+ : Match 1+ word characters \w+ : 匹配 1+ 个单词字符
  • (?:\.(?:py|mat))? : Optionally match .py or .mat : 可选匹配.py.mat

Is this what you are looking for?这是你想要的?

\|?\s*([\w\d]+[\_\w\d]+(?:\.?[\w\d]+[\_\w\d]+)+)\s*\|?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM