正则表达式查找带有下划线和可选扩展名的文件

Question

This is for work, so I've changed the extensions and files to protect the innocent.这是为了工作，所以我更改了扩展名和文件以保护无辜者。

I am parsing text from a description looking for a file name in the format word_here and it can have as many underscores as needed plus an optional extension.我正在从描述中解析文本，以查找格式为 word_here 的文件名，它可以根据需要包含任意数量的下划线以及可选的扩展名。 I was able to come up with this regular expression which works我能够想出这个有效的正则表达式

Test 1测试 1

text = 'Some text here: * my_file_stuff.mat * other_file * third_file *'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 1 Output 1

python test_regex.py

['my_file_stuff.mat', 'other_file', 'third_file']

The problem is it doesn't work for stuff like this问题是它不适用于这样的东西

Test 2测试 2

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 2 Output 2

python test_regex.py
['my_file|a', 'nother_file.mat|', 'O_HERES_ONE|', '_O_HERES_ANOTHER|']

I modified my regex to include the vertical bar, here我修改了我的正则表达式以包含竖线，这里

Test 3测试 3

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([\w]+_+[\w]+\.*[py|plot]*)\|'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 3 Output 3

 python test_regex.py
['my_file', 'another_file.mat', 'O_HERES_ONE', 'O_HERES_ANOTHER']

and that works for the second one but now not for the first one.这适用于第二个，但现在不适用于第一个。 Part of the issue is I will be searching a description for text to look up where a file is at, and I have no way of knowing what formatting it will use for files, only that they will be something in the form of MY_FILE_HERE01.py with or without the extension.部分问题是我将搜索文本描述以查找文件所在的位置，并且我无法知道它将用于文件的格式，只是它们将以 MY_FILE_HERE01.py 的形式出现有或没有扩展名。

I've tried using the not symbol to exclude the vertical bars in front and back, but that seems to come up empty for both strings.我尝试使用 not 符号来排除前后的竖线，但这两个字符串似乎都是空的。

Answer 1

You may use this regex for both kind of inputs:您可以将此正则表达式用于两种输入：

[a-zA-Z\d]+_\w+(?:\.(?:py|mat))?

RegEx Demo正则表达式演示

RegEx Details:正则表达式详细信息：

[a-zA-Z\d]+ : Match 1+ of letters or digits [a-zA-Z\d]+ ：匹配 1+ 个字母或数字
_ : Match an underscore _ : 匹配下划线
\w+ : Match 1+ word characters \w+ : 匹配 1+ 个单词字符
(?:\.(?:py|mat))? : Optionally match .py or .mat : 可选匹配.py或.mat

Answer 2

Is this what you are looking for?这是你想要的？

\|?\s*([\w\d]+[\_\w\d]+(?:\.?[\w\d]+[\_\w\d]+)+)\s*\|?

正则表达式查找带有下划线和可选扩展名的文件

问题描述

2 个解决方案

解决方案1
4 已采纳 2020-12-28 18:08:59

解决方案2
-1 2020-12-28 18:16:17

正则表达式查找带有下划线和可选扩展名的文件

问题描述

2 个解决方案

解决方案1 4 已采纳 2020-12-28 18:08:59

解决方案2 -1 2020-12-28 18:16:17

解决方案1
4 已采纳 2020-12-28 18:08:59

解决方案2
-1 2020-12-28 18:16:17