简体   繁体   中英

Invalid Regex is accepted by Java. Is this a Java bug or a missed interpretation of expectations

This pattern is not a valid regex according to several websites

groovy:000> java.util.regex.Pattern.compile("^*");
===> ^*

But the same expression in node correctly understands this:

$ node
> new RegExp('^*')
SyntaxError: Invalid regular expression: /^*/: Nothing to repeat

Who's right here? Java, node/internet? Or, am I just expecting something from the Java libs that I shouldn't

I'd say the links to the regex test tools are wrong (in the PCRE sense of it). I think this is so because of JS implementations handle these matches differently (see: https://github.com/gskinner/regexr/issues/28 )

Note that both regexr and regex101 accept ^()* and (^)* . Also, Perl v5.18.2 has no issue with it: running echo "ubar" | perl -ne "s/^*/F/; print;" echo "ubar" | perl -ne "s/^*/F/; print;" from my terminal results in no warnings or errors and will print Fubar .

This is what the PCRE specification say:

It is possible to construct infinite loops by following a subpattern that can match no characters with a quantifier that has no upper limit, for example:

(a?)*

Earlier versions of Perl and PCRE used to give an error at compile time for such patterns. However, because there are cases where this can be useful, such patterns are now accepted, but if any repetition of the subpattern does in fact match no characters, the loop is forcibly broken.

-- https://www.pcre.org/original/doc/html/pcrepattern.html

So, matching infinite amounts of zero-width matches, like ^* does, is accepted by the specs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM