简体   繁体   中英

Python regex error: bad character in group name

Can someone tell me why this regex works fine on oneline regex websites but not while using re.compile() in python.

I have used this website: https://regex101.com/ and tested string is:

"test": "value"

Python code

x = r'((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^")]))+")|(?:\w+)))(:|~)\s+((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^"]))+")|(?:\w+)))'
re.compile(x)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character in group name

If you want abilities beyond standard re, try this one: https://bitbucket.org/mrabarnett/mrab-regex

It is a drop-in replacement of re, but supports many more new features, including conditional pattern.

From your example string and the regex101 output, it looks like you are trying to match a Python string with the general form:

"word": "word"

That is to say, a groups 1 and 3 are words that can either be in double quotes, or not quoted, but no hanging quotes, group 2 is a colon or tilde and can be followed by a whitespace character. So:

goodString = "\"test\": value"
badString = "test\": value"

The problem with your regex compile string actually hints towards the solution! This question sheds light on the returned error and the Python documentation gives information on named groups.

By using named groups, you can make your expression shorter and more Pythonic!

x = r'((?P<a>\"?)\w+(?P=a))(:|~)\s+((?P<b>\"?)\w+(?P=b))'

For clarity:

group 1 = ((?P<a>\"?)\w+(?P=a))
group 2 = (:|~)\s+
group 3 = ((?P<b>\"?)\w+(?P=b))

Groups 1 and 3 capture the presence or absence of the quotation mark in a subgroup (a and b, respectively), then check for that subgroup at the end of the word.

You do not need to name the groups either! You could simply reference their number:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'

As a final test:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'
goodString = "\"test\": value"
badString = "test\": value"
print(re.match(x,goodString))
print(re.match(x,badString))

Output:

<_sre.SRE_Match object; span=(0, 13), match='"test": value'>
None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM