简体   繁体   English

两个正则表达式之间的差异

[英]Differences between two regular expressions

Do anybody know why this regex: 有谁知道为什么这个正则表达式:

/^(([a-zA-Z0-9\(\)áéíóúÁÉÍÓÚñÑ,\.°-]+ *)+)$/

works but this one doesn't: 可以,但是这个不行:

/^(([a-zA-Z0-9áéíóúÁÉÍÓÚñÑ,\.°-\(\)]+ *)+)$/

The difference is the place where the parenthesis are... I tryed with some online PHP regex testers and got the same result. 区别在于括号所在的位置...我尝试了一些在线PHP regex测试器,并获得了相同的结果。 The second one simply doesn't work... 第二个根本不起作用...

PHP returns: PHP返回:

preg_match(): Compilation failed: range out of order in character class at offset 44 in...

This is not a critic question because I've managed to make it work but I have the curiosity! 这不是批评者的问题,因为我设法使它起作用,但是我有好奇心!

Maybe the unicode characters are changing something? 也许Unicode字符正在改变某些东西?

When the - character is used inside of brackets (indicating a character set) it indicates a range unless it is the last character in the set, first character in the set, or directly after the opening negating character. 当在括号内使用-字符(表示字符集)时,它指示范围,除非它是该字符集中的最后一个字符,该字符集中的第一个字符,或紧接在开头的否定字符之后。 Then it means a literal dash. 这意味着文字破折号。 By moving it from the end to the middle you changed its meaning. 通过将其从末尾移动到中间,可以更改其含义。 If you want to keep it in the middle you will need to escape it: \\- . 如果要将其保留在中间,则需要对其进行转义: \\-

If the hyphen is placed as the first or last character in the character class, it is treated as a literal - (as opposed to a range), and as a result do not require escaping. 如果将连字符作为字符类中的第一个或最后一个字符放置,则将其视为文字- (而不是范围),因此不需要转义。

These are the positions where the hyphen do not need to be escaped: 这些是不需要转义连字符的位置:

  • right after the opening bracket ( [ ), or 在右方括号( [ )之后,或
  • right before the closing bracket ( ] ), or 就在右括号( ]之前,或者
  • right after the negating caret ( ^ ) 插入符号( ^ )之后

In the second regular expression, you're placing the hyphen in the middle, and the regular expression engine tries to create a range with the character before the hyphen, the character after the hyphen, and all characters that lie between them in numerical order. 在第二个正则表达式中,您将连字符放在中间,并且正则表达式引擎尝试创建一个范围,其中连字符前面的字符,连字符后面的字符以及按数字顺序位于它们之间的所有字符。 As such a range isn't possible, an error message is triggered. 由于无法达到此范围,因此会触发错误消息。 See asciitable.com for the character table. 有关字符表,请参见asciitable.com

Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it. 将连字符放在表达式的末尾实际上会使它不需要转义,因为它不能成为范围的一部分,但是您可能仍想养成始终转义的习惯。

At your first regex you've managed every thing correctly even that - hyphen which is at the end of it. 在您的第一个正则表达式中,您甚至已经正确地管理了所有内容-连字符(末尾是连字符)。 well it should be there too! 好吧,它也应该在那里! I mean it has two places if you don't want to escape it, one place is at the end of char class and the other one at the beginning of char class! 我的意思是,如果您不想逃避它,它有两个地方,一个地方在char类的末尾,另一个地方在char类的末尾!

You guessed nice! 你猜很好! otherwise you should escape it! 否则,您应该逃脱它!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM