简体   繁体   中英

Java Regex: Why is \177 escape code invalid?

I'm using a common regex to validate email. The pattern is this:

(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*")@((?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?$)|\[(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\]$

Just in case, I DO add all the \\ escaping in java, so this is the final pattern that java evaluates. It works in normal online regex evaluators, but when running in Java it throws

java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 103

which is the \\177 code. Why is that, why is that code illegal and why does it work in online validators?

The javadoc of Pattern gives the answer for you here; quoting:

\0n     The character with octal value 0n (0 <= n <= 7)
\0nn    The character with octal value 0nn (0 <= n <= 7)
\0mnn   The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)

The only valid way for an octal escape sequence to appear is the above.

If you write \\1xx , for whatever xx , this will be interpreted as \\1 then xx , where \\1 is a back reference to the first capturing group of the regex...

Except that in your case, index 103 is at:

...37!#-\[\]-\177]|\\
             ^^ HERE

And you are within a character class; and you can't use back references in character classes. The regex engine therefore tries to interpret it as an escape sequence, which is illegal as mentioned above. Hence the message.

Replace that with \\0177 and your problem will be solved.

As a side note, validating emails with regexes, while it is very common, is also a very bad idea. Use javax.mail instead, which can validate mail addresses using InternetAddress .

[further note: while the link above is to Java EE, actually you can add javamail as an independent jar to your project; a quick maven search will tell you that]

Inline regex validators tend to be misleading, because there are many different flavors of regex. You need a Java regex, but your validator is apparently testing for a related-but-different flavor.

For your specific issue: As you can see from the documentation for java.util.regex.Pattern , octal escapes have to start with \\0 , but can have up to three digits after that. So, change \\177 to \\0177 .

Outside of classes Java see's octal as \\0377 to distinguish it from a
backreference.

Other engines will take \\377 form but use an internal capture group
recognition at that point to distinguish it from a backreference.
These other engines won't recognize the \\nnn form inside classes,
but provide an octal bracket form \\o{nnn} for that.

As far as I can tell, you can try the \\0377 inside classes and see if
that works, otherwise I don't know if Java recognizes octal's in classes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM