The following regex is supposed to match any :text:
that's preceeded by start-of-string
, whitespace
or :
, and succeeded by end-of-string
, whitespace
or :
(Along with a few extra rules)
I'm not great at regex but I've come up with the desired solution in regexr.com:
(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:
Result: :match1:
, :match2:
, :match3:
, :match4:
But on Python 3 this raises an error.
re.search("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", txt)
re.error: look-behind requires fixed-width pattern
Anyone know a good workaround for this issue? Any tips are appreciated.
In python, you may use this work-around to avoid this error:
(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)
Anchors ^
and $
are zero-width matchers anyway.
Possibly the easiest solution would be to use the newer regex
module which supports infinite lookbehinds:
import regex as re
data = """:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:"""
for match in re.finditer("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", data):
print(match.group(0))
This yields
:match1:
:match2:
:match3:
:match4:
Another option would be to install regex
:
$ pip3 install regex
then, we'd write some expression and (*SKIP)(*FAIL)
the patterns that we wouldn't want to be there:
import regex as re
expression = r'(?:^\d+:[^:\r\n]+:$|^:[^:\r\n]+:\d+$|^(?!.*:\b\S+\b:).*$)(*SKIP)(*FAIL)|:[a-z0-9]+:'
string = '''
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:
'''
print(re.findall(expression, string))
[':match1:', ':match2:', ':match3:', ':match4:']
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.
jex.im visualizes regular expressions:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.