简体   繁体   中英

regex match part of file path if key not present

I am trying to match part of a file path if it does not include a certain keyword using regular expressions in python. For example, applying the regular expression to "/exclude/this/test/other" should not match, whereas "/this/test/other" should return the file path excluding "other", ie "/this/test", and where "other" is any directory. So far I am using this

In [153]: re.findall("^(((?!exclude).)*(?=test).*)?", "/exclude/this/test/other")
Out[153]: [('', '')]

re.findall("^(((?!exclude).)*(?=test).*)?", "/this/test/other")
Out[152]: [('/this/test/other', '/')]

but I can't get it to stop matching after "test", also there are some empty matches. Any ideas?

just use in if you only need to chek if a keyword is there:

In [33]: s1="/exclude/this/test"

In [34]: s2="this/test"

In [35]: 'exclude' in s1
Out[35]: True

In [36]: 'exclude' in s2
Out[36]: False

EDIT: or if you want the path until test only:

if 'exclude' not in s:
    re.findall(r'(.+test)',s)

You're getting the extra result because (1) you're using findall() instead of search() , and (2) you're using capturing groups instead of non-capturing

>>> import re
>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/test").group(0)
'/this/test'

This will work with findall() too, but that doesn't really make sense when you're matching the whole string. More importantly, the include part of your regex doesn't work. Check this:

>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/foo").group(0)
'/this/foo'

That's because the * in (?=test)* makes the lookahead optional, which makes it pointless. But getting rid of the * isn't really a solution, because exclude and test might be part of longer words, like excludexx or yyytest . Here's a better regex:

r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$'

tested:

>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/test').group()
'/this/test'
>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/foo').group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

EDIT: I see you fixed the "optional lookahead" problem, but now the whole regex is optional!

EDIT: If you want it to stop matching after /test , try this:

r'^(?:/(?!test\b|exclude\b)\w+)*/test\b'

(?:/(?!test\\b|exclude\\b)\\w+)* matches zero or more path components, as long as they're not /test or /exclude .

If your match is more complex than could be done with in and a simple keyword, it might be more clear if you did two regexs:

import re
s1="/exclude/this/test"
s2="this/test"

for s in (s1,s2):
    if re.search(r'exclude',s): 
        print 'excluding:',s
        continue
    print s, re.findall(r'test',s)

Prints:

excluding: /exclude/this/test
this/test ['test']

You can make two regexes compact if that is your goal:

print [(s,re.findall(r'test',s)) for s in s1,s2 if not re.search(r'exclude',s)]

Edit

If I understand your edit, this works:

s1="/exclude/this/test/other"
s2="/this/test/other"

print [(s,re.search(r'(.*?)/[^/]+$',s).group(1)) for s in s1,s2 if not re.search(r'exclude',s)]

Prints:

[('/this/test/other', '/this/test')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM