简体   繁体   中英

Understanding Positive Look Ahead Assertion

From Python 3.4.1 docs:

(?=...)

Positive lookahead assertion. This succeeds if the contained regular expression, represented here by ..., successfully matches at the current location, and fails otherwise. But, once the contained expression has been tried, the matching engine doesn't advance at all; the rest of the pattern is tried right where the assertion started .

I'm trying to understand regex in Python. Could you please help me understand the second sentences, especially the bolded words? Any example will be appreciated.

Lookarounds are zero-width assertions. They don't consume any characters on the string.

To touch briefly on the bolded portions of the documentation:

This means that after looking ahead, the regular expression engine is back at the same position on the string from where it started looking. From there, it can start matching again...

The key point:

You can get a zero-width match which is a match that does not consume any characters. It only matches a position in the string. The point of zero-width is the validation to see if a regular expression can or cannot be matched looking ahead or looking back from the current position, without adding them to the overall match.

Generally a Regular Expression engine is "consuming" your string character by character as it matches up with your regular expression.

If you use a look-ahead operator, the engine will instead simply look ahead without "consuming" any characters while it looks for a match.

Example

A good example is a regular expression to match a password where it needs to have a single numeric digit as well as be between 6-20 characters long.

You could write two checks (one to check if a digit exists, and one to check if the string length is as required), or use a single regular expression:

(?=.*\d).{6,20}

The first portion (?=.*\\d) checks if there is digit anywhere in the string. When it completes we are back at the beginning of the string again (we were only "looking-ahead") and if it passed, we go onto the next portion of the regex.

Now .{6,20} is no longer a lookahead, and begins consuming the string. When the entire string is consumed, a match has been found.

An answer in an example form. On string "xy" :

  • (?:x) will match "x"
  • (?:x)x will not match, because there is no another x after x
  • (?:x)y will match "xy" , by advancing over x and then y .

  • (?=x) will match "" at the start of the string, since x is following.

  • (?=x)x will match "x" - it recognises that an x follows, and then it advances over it.
  • (?=x)y will not match, since it affirms there is an x following, but then tries to advance over it using y .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM