简体   繁体   中英

Php regex with safe delimiters

I've thought that php's perl compatible regular expression (preg library) supports curly brackets as delimiters. This should be fine:

{ello {world}i // should match on Hello {World

The main point of curly brackets is that it only takes the most left and right ones, thus requiring no escaping for the inner ones. As far as I know, php requires the escaping

{ello \{world}i // this actually matches on Hello {World

Is this the expected behavior or bug in php preg implementation?

Expected behavior as far as I know, otherwise how else would the compiler allow group limiters? eg

[a-z]{1,5}

From http://lv.php.net/manual/en/regexp.reference.delimiters.php :

If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.

So this is expected behavior, not a bug.

When in Perl you use for the pattern delimiter any of the four paired ASCII bracket types, you only need to escape unpaired brackets within the pattern. This is indeed the entire purpose of using brackets. This is documented in the perlop manpage under “Quote and Quote-like Operators”, which reads in part:

   Non-bracketing delimiters use the same character fore and aft, 
   but the four sorts of brackets (round, angle, square, curly) 
   will all nest, which means that

      q{foo{bar}baz}

   is the same as

      'foo{bar}baz'

   Note, however, that this does not always work for quoting Perl code:

      $s = q{ if($a eq "}") ... }; # WRONG

That's why you often see people use m{…} or qr{…} in Perl code, especially for multiline patterns used with /x ᴀᴋᴀ (?x) . For example:

return qr{                  
    (?=                     # pure lookahead for conjunctive matching
        \A                  # always from start
        . *?                # going only as far as we need to to find the pattern
        (?:
            ${case_flag}
            ${left_boundary}
            ${positive_pattern}
            ${right_boundary}
        )
    )
}sxm;

Notice how those nested braces are no problem.

I found that no escaping is required in this case:

'ello {world'i
(ello {world)i

So my theory is, that the problem is with the '{' delimiters only. Also, the following two produce the same error:

{ello {world}i
(ello (world)i

Using starting/ending braces as delimiters may require to escape the given braces in the expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM