正则表达式在C＃中匹配，但在java中不匹配

Question

I have the following regex (long, I know): 我有以下正则表达式（很久，我知道）：

(?-mix:((?-mix:(?-mix:\{\%).*?(?-mix:\%\})|(?-mix:\{\{).*?(?-mix:\}\}?))
|(?-mix:\{\{|\{\%)))

that I'm using to split a string. 我正在使用分割字符串。 It matches correctly in C#, but when I moved the code to Java, it doesn't match. 它在C＃中正确匹配，但是当我将代码移动到Java时，它不匹配。 Is there any particular feature of this regex that is C#-only? 这个正则表达式的任何特殊功能是C＃-only吗？

The source is produced as: 来源如下：

String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");

While in C# it's: 在C＃中它是：

string source = @"{% assign foo = values %}.{{ foo[0] }}.";

The C# version is like this: C＃版本是这样的：

string[] split = Regex.split(source, regex);

In Java I tried both: 在Java中我尝试了两个：

String[] split = source.split(regex);

and also 并且

Pattern p = Pattern.compile(regex);
String[] split = p.split(source);

Answer 1

Here is a sample program with your code: http://ideone.com/hk3uy 以下是您的代码示例程序： http ： //ideone.com/hk3uy

There is a major difference here between Java and other languages: Java does not add captured groups as tokens in the result array ( example ). Java和其他语言之间存在重大差异：Java不会将捕获的组添加为结果数组中的标记（示例）。 That means that all delimiters are removed from result, though they would be included in .Net. 这意味着所有分隔符都会从结果中删除，尽管它们将包含在.Net中。
The only alternative I know is not to use split , but getting a list of matches and splitting manually. 我知道的唯一选择是不使用split ，而是获取匹配列表并手动拆分。

Answer 2

I think the problem is with how you're defining source . 我认为问题在于你如何定义source 。 On my system, this: 在我的系统上，这个：

String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");

is equivalent to this: 相当于：

String source = "\\Q{% assign foo = values %}.{{ foo[0] }}.\\E";

(that is, it adds a stray \\Q and \\E ), but the way the method is defined, your Java implementation could treat it as equivalent to this: （也就是说，它添加了一个迷路\\Q和\\E ），但是定义方法的方式，您的Java实现可以将其视为等效于此：

String source = "\\{% assign foo = values %\\}\\.\\{\\{ foo\\[0\\] \\}\\}\\.";

(that is, inserting lots of backslashes). （也就是说，插入大量的反斜杠）。

Your regex itself seems fine. 你的正则表达式看起来很好。 This program: 这个程序：

public static void main(final String... args)
{
    final Pattern p = Pattern.compile("(?-mix:((?-mix:(?-mix:\\{\\%).*?(?-mix:\\%\\})|(?-mix:\\{\\{).*?(?-mix:\\}\\}?))|(?-mix:\\{\\{|\\{\\%)))");
    for(final String s : p.split("a{%b%}c{{d}}e{%f%}g{{h}}i{{j{%k"))
    System.out.println(s);
}

prints 版画

a
c
e
g
i
j
k

that is, it successfully treats {%b%} , {{d}} , {%f%} , {{h}} , {{ , and {% as split-points, with all the non-greediness you'd expect. 也就是说，它成功地处理{%b%} ， {{d}} ， {%f%} ， {{h}} ， {{和{%作为分裂点，所有非贪婪你都是期望。 But tor the record, it also works if I strip p down to just 但要记录下来，如果我将p剥离到公正，它也会起作用

Pattern.compile("\\{%.*?%\\}|\\{\\{.*?\\}\\}?|\\{\\{|\\{%");

;-) ;-)

Answer 3

使用\\\\{而不是\\{和其他符号

正则表达式在C＃中匹配，但在java中不匹配

问题描述

3 个解决方案

解决方案1
4 已采纳 2011-12-02 21:02:02

解决方案2
2 2011-12-02 20:58:18

解决方案3
0 2011-12-02 20:35:07

正则表达式在C＃中匹配，但在java中不匹配

问题描述

3 个解决方案

解决方案1 4 已采纳 2011-12-02 21:02:02

解决方案2 2 2011-12-02 20:58:18

解决方案3 0 2011-12-02 20:35:07

解决方案1
4 已采纳 2011-12-02 21:02:02

解决方案2
2 2011-12-02 20:58:18

解决方案3
0 2011-12-02 20:35:07