简体   繁体   中英

Regular Expression conditional lookahead with next capture group match PCRE

I searched for answer and I did not found anything about that. hopefully that you can help me with my question.

So I try to search after string with lookahead conditional based on capture group at the end of a string. It means if the capture group at the end is a match, make the conditional group be with something and if capture group at the end is not a match so with something else.

See my regex in use here

(?:((?(?=ls)yes|no))\${(?:(?P<type>VAR)\s+)([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*\=\s*(\$\{CALL\s+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*\s*\}|\"[^\"]*\"|'[^']*'|[0-9]*|(?:[fF]alse|[tT]rue))\s*\}(?<ls>[^\s]{1}))

Input:

    ${VAR foo="What"}x

    ${VAR foo="What"} 

    yes${VAR foo="What"}

    no${VAR foo="What"}x

As you see, it captures the word 'no' if there is something at the end as long it is not \\s, but it did not capture the word 'yes' if it is nothing.

Your pattern contains (?(?=ls)yes|no) which is literally looking ahead for the characters ls . I've changed your pattern to utilize the DEFINE construct for subpattern reusability. As far as I'm aware, PCRE does not have a method to check whether or not a group was defined after the conditional. This can be accomplished in with the use of balancing groups, but PCRE doesn't employ those methods. PCRE does have the (?(name)yes|no) or (?(1)yes|no) conditional, but it doesn't work for forward references (comparative to testing whether or not a variable exists before it's even declared).

See regex in use here

(?(DEFINE)
  (?# var )
  (?<var>[a-zA-Z_\x7f-\xff][\w\x7f-\xff]*)
  (?# val )
  (?<val>(?&call)|(?&str)|(?&num)|(?&bool))
  (?<call>\$\{CALL\s+[a-zA-Z_\x7f-\xff][\w\x7f-\xff]*\s*\})
  (?<str>"[^"]*"|'[^']*')
  (?<num>\d+)
  (?<bool>(?i)(?:false|true)(?-i))
)
((?(?=yes\${VAR\s+(?&var)\s*\=\s*(?&val)\s*\}\s)yes|no))
\${(?P<type>VAR)\s+((?&var))\s*\=\s*((?&val))\s*\}(\S)?

Without duplicating the subpattern in the positive lookahead, you can use the following ( as seen in use here ). The token (?8) recurses the 8th capture group:

(?(DEFINE)
  (?# var )
  (?<var>[a-zA-Z_\x7f-\xff][\w\x7f-\xff]*)
  (?# val )
  (?<val>(?&call)|(?&str)|(?&num)|(?&bool))
  (?<call>\$\{CALL\s+[a-zA-Z_\x7f-\xff][\w\x7f-\xff]*\s*\})
  (?<str>"[^"]*"|'[^']*')
  (?<num>\d+)
  (?<bool>(?i)(?:false|true)(?-i))
)
((?(?=no(?8)\S)no|yes))
(\${(?P<type>VAR)\s+((?&var))\s*\=\s*((?&val))\s*\})(\S)?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM