简体   繁体   中英

Python regex error: look-behind requires fixed-width pattern

The following regex is supposed to match any :text: that's preceeded by start-of-string , whitespace or : , and succeeded by end-of-string , whitespace or : (Along with a few extra rules)

I'm not great at regex but I've come up with the desired solution in regexr.com:

(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

Result: :match1: , :match2: , :match3: , :match4:

But on Python 3 this raises an error.

re.search("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", txt)

re.error: look-behind requires fixed-width pattern

Anyone know a good workaround for this issue? Any tips are appreciated.

In python, you may use this work-around to avoid this error:

(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)

Anchors ^ and $ are zero-width matchers anyway.

RegEx Demo

Possibly the easiest solution would be to use the newer regex module which supports infinite lookbehinds:

import regex as re

data = """:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:"""

for match in re.finditer("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", data):
    print(match.group(0))

This yields

:match1:
:match2:
:match3:
:match4:

Another option would be to install regex :

$ pip3 install regex

then, we'd write some expression and (*SKIP)(*FAIL) the patterns that we wouldn't want to be there:

import regex as re

expression = r'(?:^\d+:[^:\r\n]+:$|^:[^:\r\n]+:\d+$|^(?!.*:\b\S+\b:).*$)(*SKIP)(*FAIL)|:[a-z0-9]+:'
string = '''
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

'''

print(re.findall(expression, string))

Output

[':match1:', ':match2:', ':match3:', ':match4:']

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM