简体   繁体   中英

Regex matches in C# but not in java

I have the following regex (long, I know):

(?-mix:((?-mix:(?-mix:\{\%).*?(?-mix:\%\})|(?-mix:\{\{).*?(?-mix:\}\}?))
|(?-mix:\{\{|\{\%)))

that I'm using to split a string. It matches correctly in C#, but when I moved the code to Java, it doesn't match. Is there any particular feature of this regex that is C#-only?

The source is produced as:

String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");

While in C# it's:

string source = @"{% assign foo = values %}.{{ foo[0] }}.";

The C# version is like this:

string[] split = Regex.split(source, regex);

In Java I tried both:

String[] split = source.split(regex);

and also

Pattern p = Pattern.compile(regex);
String[] split = p.split(source);

Here is a sample program with your code: http://ideone.com/hk3uy

There is a major difference here between Java and other languages: Java does not add captured groups as tokens in the result array ( example ). That means that all delimiters are removed from result, though they would be included in .Net.
The only alternative I know is not to use split , but getting a list of matches and splitting manually.

I think the problem is with how you're defining source . On my system, this:

String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");

is equivalent to this:

String source = "\\Q{% assign foo = values %}.{{ foo[0] }}.\\E";

(that is, it adds a stray \\Q and \\E ), but the way the method is defined, your Java implementation could treat it as equivalent to this:

String source = "\\{% assign foo = values %\\}\\.\\{\\{ foo\\[0\\] \\}\\}\\.";

(that is, inserting lots of backslashes).

Your regex itself seems fine. This program:

public static void main(final String... args)
{
    final Pattern p = Pattern.compile("(?-mix:((?-mix:(?-mix:\\{\\%).*?(?-mix:\\%\\})|(?-mix:\\{\\{).*?(?-mix:\\}\\}?))|(?-mix:\\{\\{|\\{\\%)))");
    for(final String s : p.split("a{%b%}c{{d}}e{%f%}g{{h}}i{{j{%k"))
    System.out.println(s);
}

prints

a
c
e
g
i
j
k

that is, it successfully treats {%b%} , {{d}} , {%f%} , {{h}} , {{ , and {% as split-points, with all the non-greediness you'd expect. But tor the record, it also works if I strip p down to just

Pattern.compile("\\{%.*?%\\}|\\{\\{.*?\\}\\}?|\\{\\{|\\{%");

;-)

使用\\\\{而不是\\{和其他符号

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM