I am trying to match part of a file path if it does not include a certain keyword using regular expressions in python. For example, applying the regular expression to "/exclude/this/test/other" should not match, whereas "/this/test/other" should return the file path excluding "other", ie "/this/test", and where "other" is any directory. So far I am using this
In [153]: re.findall("^(((?!exclude).)*(?=test).*)?", "/exclude/this/test/other")
Out[153]: [('', '')]
re.findall("^(((?!exclude).)*(?=test).*)?", "/this/test/other")
Out[152]: [('/this/test/other', '/')]
but I can't get it to stop matching after "test", also there are some empty matches. Any ideas?
just use in
if you only need to chek if a keyword is there:
In [33]: s1="/exclude/this/test"
In [34]: s2="this/test"
In [35]: 'exclude' in s1
Out[35]: True
In [36]: 'exclude' in s2
Out[36]: False
EDIT: or if you want the path until test only:
if 'exclude' not in s:
re.findall(r'(.+test)',s)
You're getting the extra result because (1) you're using findall()
instead of search()
, and (2) you're using capturing groups instead of non-capturing
>>> import re
>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/test").group(0)
'/this/test'
This will work with findall()
too, but that doesn't really make sense when you're matching the whole string. More importantly, the include part of your regex doesn't work. Check this:
>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/foo").group(0)
'/this/foo'
That's because the *
in (?=test)*
makes the lookahead optional, which makes it pointless. But getting rid of the *
isn't really a solution, because exclude
and test
might be part of longer words, like excludexx
or yyytest
. Here's a better regex:
r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$'
tested:
>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/test').group()
'/this/test'
>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/foo').group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
EDIT: I see you fixed the "optional lookahead" problem, but now the whole regex is optional!
EDIT: If you want it to stop matching after /test
, try this:
r'^(?:/(?!test\b|exclude\b)\w+)*/test\b'
(?:/(?!test\\b|exclude\\b)\\w+)*
matches zero or more path components, as long as they're not /test
or /exclude
.
If your match is more complex than could be done with in
and a simple keyword, it might be more clear if you did two regexs:
import re
s1="/exclude/this/test"
s2="this/test"
for s in (s1,s2):
if re.search(r'exclude',s):
print 'excluding:',s
continue
print s, re.findall(r'test',s)
Prints:
excluding: /exclude/this/test
this/test ['test']
You can make two regexes compact if that is your goal:
print [(s,re.findall(r'test',s)) for s in s1,s2 if not re.search(r'exclude',s)]
Edit
If I understand your edit, this works:
s1="/exclude/this/test/other"
s2="/this/test/other"
print [(s,re.search(r'(.*?)/[^/]+$',s).group(1)) for s in s1,s2 if not re.search(r'exclude',s)]
Prints:
[('/this/test/other', '/this/test')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.