[英]Regular Expression over a file path to match file names that don't start with some word
In a text containing lines with full paths, I need to match only lines whose file name doesn't start with the word 'TMP' (case insensitive).在包含具有完整路径的行的文本中,我只需要匹配文件名不以单词“TMP” (不区分大小写)开头的行。
In the next sample list, lines marked with "EXCLUDE" shouldn't be matched.在下一个示例列表中,不应匹配标有“EXCLUDE”的行。
c:\folder1\TMP_file.ext----------EXCLUDE
c:\TMP_folder1\file.ext
c:\folder1\TMP_folder2\file.ext
c:\folder1/TMP_file.ext----------EXCLUDE
c:\file.ext
c:\TMP_file.ext------------------EXCLUDE
TMP_file.ext---------------------EXCLUDE
file.ext
I came up with the simple expression [^\\/\r\n]+$
(accepting '\' and '/' as directory separators) that successfully matches whole file names with their extensions, but I can't figure out how to add (?....)
to exclude the matches that start with 'tmp'.我想出了简单的表达式
[^\\/\r\n]+$
(接受'\'和'/'作为目录分隔符)成功匹配整个文件名及其扩展名,但我不知道如何添加(?....)
以排除以“tmp”开头的匹配项。
Inverting the expression tmp[^\\/\r\n]+$
would be also the solution, but I don't know how.反转表达式
tmp[^\\/\r\n]+$
也是解决方案,但我不知道如何。
I know this question is similar to others (taking the risk of a downvote...) but I didn't found a way to connect them with this problem.我知道这个问题与其他问题相似(冒着被否决的风险......)但我没有找到将它们与这个问题联系起来的方法。
Regex is not the right solution here. Regex 在这里不是正确的解决方案。 You better iterate over file names, takes the base path, and skip if it startswith 'TMP'.
您最好遍历文件名,采用基本路径,如果它以“TMP”开头则跳过。
def filter_tmp(text):
paths = text.split('\n')
for p in paths:
if not os.path.basename(p).startswith('TMP'):
yield p
Then list(filter_tmp(text))
would give you the list of non-temp paths.然后
list(filter_tmp(text))
会给你非临时路径的列表。
You can use您可以使用
(?i)^(?!(?:.*[/\\])?TMP(?![^\W_])[^/]*$).+
See the regex demo ( [^/]
is replaced with [^/\n]
since the regex is tested against a single multiline string).请参阅正则表达式演示(
[^/]
替换为[^/\n]
因为正则表达式是针对单个多行字符串进行测试的)。
Details细节
^
- start of string ^
- 字符串的开始(??(:.?*[/\\])?TMP(?![^\W_])[^/]*$)
- a negative lookahead that fails the match if, immediately to the right of the current location, there is (??(:.?*[/\\])?TMP(?![^\W_])[^/]*$)
- 如果紧接在当前位置的右侧,则匹配失败的否定前瞻, 有
(?:.*[/\\])?
- an optional occurrence of any 0+ chars other than line break chars as many as possible and then /
or \
/
或\
TMP(?![^\W_])
- TMP
(case insensitive) not followed with a letter or digit (can be followed with _
) TMP(?![^\W_])
- TMP
(不区分大小写)后面没有字母或数字(后面可以跟_
)[^/]*
- any 0 or more chars other than /
[^/]*
- 除/
之外的任何 0 个或多个字符$
- end of string. $
- 字符串结尾。.+
- one or more chars other than line break chars. .+
- 除换行符以外的一个或多个字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.