I'm using a common regex to validate email. The pattern is this:
(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*")@((?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?$)|\[(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\]$
Just in case, I DO add all the \\ escaping in java, so this is the final pattern that java evaluates. It works in normal online regex evaluators, but when running in Java it throws
java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 103
which is the \\177 code. Why is that, why is that code illegal and why does it work in online validators?
The javadoc of Pattern
gives the answer for you here; quoting:
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
The only valid way for an octal escape sequence to appear is the above.
If you write \\1xx
, for whatever xx
, this will be interpreted as \\1
then xx
, where \\1
is a back reference to the first capturing group of the regex...
Except that in your case, index 103 is at:
...37!#-\[\]-\177]|\\
^^ HERE
And you are within a character class; and you can't use back references in character classes. The regex engine therefore tries to interpret it as an escape sequence, which is illegal as mentioned above. Hence the message.
Replace that with \\0177
and your problem will be solved.
As a side note, validating emails with regexes, while it is very common, is also a very bad idea. Use javax.mail instead, which can validate mail addresses using InternetAddress
.
[further note: while the link above is to Java EE, actually you can add javamail as an independent jar to your project; a quick maven search will tell you that]
Inline regex validators tend to be misleading, because there are many different flavors of regex. You need a Java regex, but your validator is apparently testing for a related-but-different flavor.
For your specific issue: As you can see from the documentation for java.util.regex.Pattern
, octal escapes have to start with \\0
, but can have up to three digits after that. So, change \\177
to \\0177
.
Outside of classes Java see's octal as \\0377
to distinguish it from a
backreference.
Other engines will take \\377
form but use an internal capture group
recognition at that point to distinguish it from a backreference.
These other engines won't recognize the \\nnn
form inside classes,
but provide an octal bracket form \\o{nnn}
for that.
As far as I can tell, you can try the \\0377
inside classes and see if
that works, otherwise I don't know if Java recognizes octal's in classes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.