简体   繁体   中英

Regex match with negative lookbehind, recursive pattern and negative lookahead

I need to match this:

void function{ {  {  } }}   

(function definition with balanced parenthesis) but not this

static stTLookupTable RxTable[MAX]={
     
    {zero, one},{zero, one},{zero, one}};

I have tried to match with lookarounds with (?<?[[=])({((?>[^{}]+|(?R))*)})(;!;) But this matches {zero, one} in the variable declaration.

(?<![[=]){((?>[^{}]+|(?R))*)}[^;]$ doesn't work either.

In short, I need it to match function definition, but not the array declaration, assuming array initialization starts with ]= . Does anyone know how to match the function definition alone?

PS: {((?>[^{}]+|(?R))*)} matches for balanced paranthesis

Assuming you are using PyPi regex module you can use

import regex
text = """void function{ {  {  } }}   
static stTLookupTable RxTable[MAX]={
     
    {zero, one},{zero, one},{zero, one}};"""

print( [x.group(3) for x in regex.finditer(r'=\s*({(?>[^{}]+|(?1))*})(*SKIP)(*F)|({((?>[^{}]+|(?2))*)})', text)] )
# => [' {  {  } }']

See the Python demo online .

Details :

  • =\s*({(?>[^{}]+|(?1))*})(*SKIP)(*F) :
    • = - a = char
    • \s* - zero or more whitespaces
    • ({(?>[^{}]+|(?1))*}) - a substring between balanced {...}
    • (*SKIP)(*F) - skips the match and restarts the search from the failure position
  • | - or
  • ({((?>[^{}]+|(?2))*)}) - Group 2 (technical, used for recursion):
    • {((?>[^{}]+|(?2))*)} - matches a {...} substring with balanced curly braces.

You need to return Group 3 from the matches.

Using (?R) will recurse the whole pattern.

You can match void function or anything except the [MAX]= by matching word characters \w+ or excluding allowed characters using [^\s{}=,]+ and recurse the first subpattern (?1) using the PyPi regex module.

\w+(?: \w+)*({(?:[^{}]++|(?1))*})

Explanation

  • \w+(?: \w+)* Match 1 or more words before the {
  • ( Capture group 1
    • {(?:[^{}]++|(?1))*} Match the opening and closing curly's recursing the first sub pattern (?1)
  • ) Close group 1

Regex demo | Python demo

import regex

pattern = r"\w+(?: \w+)*({(?:[^{}]++|(?1))*})"

s = ("void function{ {  {  } }} \n\n\n"
    "static stTLookupTable RxTable[MAX]={\n"
    "     \n"
    "    {zero, one},{zero, one},{zero, one}};")

matches = regex.finditer(pattern, s)

for matchNum, match in enumerate(matches, start=1):    
    print (match.group())

Output

void function{ {  {  } }}

To remove the {...} part:

import regex

pattern = r"(\w+(?: \w+))({(?:[^{}]++|(?2))*})"

s = ("void function{ {  {  } }} \n\n\n"
    "static stTLookupTable RxTable[MAX]={\n"
    "     \n"
    "    {zero, one},{zero, one},{zero, one}};")

print(regex.sub(pattern, r"\1", s))

See another python demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM