简体   繁体   中英

Regex Lookahead and Lookbehind clarifications

Ok so I know there are some Regex questions out here on lookahead and lookbehind , but I haven't found some anwsers, to my interior questions, that I can easily relate to (...oh well).

So here's how I understand Regex lookahead and lookbehind!

Lookaheads/Lookbehinds (LA/LB) :

LA/LB preceding main Regex

 (?=IF_YOU_FIND_WHAT_IS_HERE)START_MATCHING_WHAT_IS_HERE (?!IF_YOU_DO_NOT_FIND_WHAT_IS_HERE)START_MATCHING_WHAT_IS_HERE 

LA/LB succeeding main Regex

 START_MATCHING_WHAT_IS_HERE(?=UNTIL_THIS IS_NOT TRUE) START_MATCHING_WHAT_IS_HERE(?!UNTIL_THIS IS_NOT TRUE) 

Ok so for the second part ( succeeding ), I'm really not sure and I would appreciate some rewriting of the above notations or some thumbs up for my excellent understanding (oh yeah).

So back on earth, as I understand it, after each character it matches in the "main" Regex...

  1. Positive lookahead : it checks if what lies ahead still matches with the lookahead part.
  2. Negative lookahead : it checks if what lies ahead still doesn't match the lookahead part.
  3. Positive lookbehind : it checks if what has been matched still matches the lookbehind part
  4. Negative lookbehind : it checks if what has been matched still doesn't match the lookbehind part.

Now, for the SRLC section (Super Regex Lookout Combos)

Let's look at this Regex

 (?<=REGEX_1)(?<!REGEX_2((MAIN_REGEX(?<!REGEX_3))(?=REGEX_4))) 

My strategy in approching this would be, well, in some cases, we could combine REGEX_1 and REGEX_2. If that was the case, we would have :

 (?<=REGEX_C)((MAIN_REGEX(?<!REGEX_3))(?=REGEX_4)) 

C for : Combined

Essentially, what I understand is that :

  1. REGEX_C must succeed first in order to for the MAIN_REGEX starts matching
  2. Then, the MAIN_REGEX starts matching character-by-character
  3. Immediately after a positive-match , REGEX_3 analyses the global match.
  4. Next after is the REGEX_4, who will look ahead to see if all is good.
  5. Then we start over from 2 and try matching the next character .
  6. *Of course, if any REGEX fails, the global match is reseted.

I have no clue, if what I wrote is accurate haha. It's to0 messy when I want to try it out. Most of the time I succeed by trial and errors, but I would like to have somes clarifications so I can get it on my first try. Boom

Thanks for your replies!

Being successful in understanding assertions is that they all involve
looking in a direction from BETWEEN characters, not at, on, before, later
or anything else you can think of.

Since they are between characters, they have a priority for analysis by the
regex engine.

The priority for character matching is from left to right .
So is the reading order of a regex.

The priority for assertions are:
An assertion before something is checked first.
An assertion after something is checked last.

And, the position between characters is where it's checked.
You have to imagine yourself at that position when you write the assertion.


Update with more explanation

Usually, the best way to get better used to assertions is to look at examples.

This is your template expression as I see it.

 (?<= REGEX_1 )      # Here is Between a character, lookbehind for a certain set of chars

 (?<! REGEX_2 )      # At the same place, lookbehind that a char subset is not there;

 (                   # (1 start)
      MAIN_REGEX          # Some data to match
 )                   # (1 end)

 (?<! REGEX_3 )      # Here is Between the last char matched in group 1
                     # and the next character yet to be matched.
                     # Look behind at the last char matched in group 1  
                     # and make sure it is within a set of chars.

 (?= REGEX_4 )       # At the same place, look ahead that a subset of chars are there

Here is something more concrete.

This is how a regex would look for the word boundary construct \\b .
The word boundary actually only exists between characters.
It looks in both directions in two different ways to satisfy itself.

Study this for a while.

 (?:                           # Cluster start
      (?:                           # -------
           ^                             # Beginning of string anchor
        |                              # or,
           (?<= [^a-zA-Z0-9_] )          # Lookbehind assertion for a char that is NOT a word
      )                             # -------
      (?= [a-zA-Z0-9_] )            # Lookahead assertion for a char that is IS a word

   |                              # or,

      (?<= [a-zA-Z0-9_] )           # Lookbehind assertion for a char that is IS a word
      (?:                           # -------
           $                             # End of string anchor
        |                              # or,
           (?= [^a-zA-Z0-9_] )           # Lookahead assertion for a char that is NOT a word
      )                             # -------
 )                             # Cluster end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM