简体   繁体   中英

Differences between two regular expressions

Do anybody know why this regex:

/^(([a-zA-Z0-9\(\)áéíóúÁÉÍÓÚñÑ,\.°-]+ *)+)$/

works but this one doesn't:

/^(([a-zA-Z0-9áéíóúÁÉÍÓÚñÑ,\.°-\(\)]+ *)+)$/

The difference is the place where the parenthesis are... I tryed with some online PHP regex testers and got the same result. The second one simply doesn't work...

PHP returns:

preg_match(): Compilation failed: range out of order in character class at offset 44 in...

This is not a critic question because I've managed to make it work but I have the curiosity!

Maybe the unicode characters are changing something?

When the - character is used inside of brackets (indicating a character set) it indicates a range unless it is the last character in the set, first character in the set, or directly after the opening negating character. Then it means a literal dash. By moving it from the end to the middle you changed its meaning. If you want to keep it in the middle you will need to escape it: \\- .

If the hyphen is placed as the first or last character in the character class, it is treated as a literal - (as opposed to a range), and as a result do not require escaping.

These are the positions where the hyphen do not need to be escaped:

  • right after the opening bracket ( [ ), or
  • right before the closing bracket ( ] ), or
  • right after the negating caret ( ^ )

In the second regular expression, you're placing the hyphen in the middle, and the regular expression engine tries to create a range with the character before the hyphen, the character after the hyphen, and all characters that lie between them in numerical order. As such a range isn't possible, an error message is triggered. See asciitable.com for the character table.

Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it.

At your first regex you've managed every thing correctly even that - hyphen which is at the end of it. well it should be there too! I mean it has two places if you don't want to escape it, one place is at the end of char class and the other one at the beginning of char class!

You guessed nice! otherwise you should escape it!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM