简体   繁体   English

文件路径上的正则表达式以匹配不以某个单词开头的文件名

[英]Regular Expression over a file path to match file names that don't start with some word

In a text containing lines with full paths, I need to match only lines whose file name doesn't start with the word 'TMP' (case insensitive).在包含具有完整路径的行的文本中,我只需要匹配文件名不以单词“TMP” (不区分大小写)开头的行。

In the next sample list, lines marked with "EXCLUDE" shouldn't be matched.在下一个示例列表中,不应匹配标有“EXCLUDE”的行。

c:\folder1\TMP_file.ext----------EXCLUDE
c:\TMP_folder1\file.ext
c:\folder1\TMP_folder2\file.ext
c:\folder1/TMP_file.ext----------EXCLUDE
c:\file.ext
c:\TMP_file.ext------------------EXCLUDE
TMP_file.ext---------------------EXCLUDE
file.ext

I came up with the simple expression [^\\/\r\n]+$ (accepting '\' and '/' as directory separators) that successfully matches whole file names with their extensions, but I can't figure out how to add (?....) to exclude the matches that start with 'tmp'.我想出了简单的表达式[^\\/\r\n]+$ (接受'\''/'作为目录分隔符)成功匹配整个文件名及其扩展名,但我不知道如何添加(?....)以排除以“tmp”开头的匹配项。

Inverting the expression tmp[^\\/\r\n]+$ would be also the solution, but I don't know how.反转表达式tmp[^\\/\r\n]+$也是解决方案,但我不知道如何。

I know this question is similar to others (taking the risk of a downvote...) but I didn't found a way to connect them with this problem.我知道这个问题与其他问题相似(冒着被否决的风险......)但我没有找到将它们与这个问题联系起来的方法。

Regex is not the right solution here. Regex 在这里不是正确的解决方案。 You better iterate over file names, takes the base path, and skip if it startswith 'TMP'.您最好遍历文件名,采用基本路径,如果它以“TMP”开头则跳过。

def filter_tmp(text):
    paths = text.split('\n')
    for p in paths:
        if not os.path.basename(p).startswith('TMP'):
            yield p

Then list(filter_tmp(text)) would give you the list of non-temp paths.然后list(filter_tmp(text))会给你非临时路径的列表。

You can use您可以使用

(?i)^(?!(?:.*[/\\])?TMP(?![^\W_])[^/]*$).+

See the regex demo ( [^/] is replaced with [^/\n] since the regex is tested against a single multiline string).请参阅正则表达式演示[^/]替换为[^/\n]因为正则表达式是针对单个多行字符串进行测试的)。

Details细节

  • ^ - start of string ^ - 字符串的开始
  • (??(:.?*[/\\])?TMP(?![^\W_])[^/]*$) - a negative lookahead that fails the match if, immediately to the right of the current location, there is (??(:.?*[/\\])?TMP(?![^\W_])[^/]*$) - 如果紧接在当前位置的右侧,则匹配失败的否定前瞻, 有
    • (?:.*[/\\])? - an optional occurrence of any 0+ chars other than line break chars as many as possible and then / or \ - 尽可能多地出现除换行符以外的任何 0+ 个字符,然后/\
    • TMP(?![^\W_]) - TMP (case insensitive) not followed with a letter or digit (can be followed with _ ) TMP(?![^\W_]) - TMP (不区分大小写)后面没有字母或数字(后面可以跟_
    • [^/]* - any 0 or more chars other than / [^/]* - 除/之外的任何 0 个或多个字符
    • $ - end of string. $ - 字符串结尾。
  • .+ - one or more chars other than line break chars. .+ - 除换行符以外的一个或多个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM