[英]Regex matches in C# but not in java
I have the following regex (long, I know): 我有以下正则表达式(很久,我知道):
(?-mix:((?-mix:(?-mix:\{\%).*?(?-mix:\%\})|(?-mix:\{\{).*?(?-mix:\}\}?))
|(?-mix:\{\{|\{\%)))
that I'm using to split a string. 我正在使用分割字符串。 It matches correctly in C#, but when I moved the code to Java, it doesn't match. 它在C#中正确匹配,但是当我将代码移动到Java时,它不匹配。 Is there any particular feature of this regex that is C#-only? 这个正则表达式的任何特殊功能是C#-only吗?
The source is produced as: 来源如下:
String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");
While in C# it's: 在C#中它是:
string source = @"{% assign foo = values %}.{{ foo[0] }}.";
The C# version is like this: C#版本是这样的:
string[] split = Regex.split(source, regex);
In Java I tried both: 在Java中我尝试了两个:
String[] split = source.split(regex);
and also 并且
Pattern p = Pattern.compile(regex);
String[] split = p.split(source);
Here is a sample program with your code: http://ideone.com/hk3uy 以下是您的代码示例程序: http : //ideone.com/hk3uy
There is a major difference here between Java and other languages: Java does not add captured groups as tokens in the result array ( example ). Java和其他语言之间存在重大差异:Java不会将捕获的组添加为结果数组中的标记( 示例 )。 That means that all delimiters are removed from result, though they would be included in .Net. 这意味着所有分隔符都会从结果中删除,尽管它们将包含在.Net中。
The only alternative I know is not to use split
, but getting a list of matches and splitting manually. 我知道的唯一选择是不使用split
,而是获取匹配列表并手动拆分。
I think the problem is with how you're defining source
. 我认为问题在于你如何定义source
。 On my system, this: 在我的系统上,这个:
String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");
is equivalent to this: 相当于:
String source = "\\Q{% assign foo = values %}.{{ foo[0] }}.\\E";
(that is, it adds a stray \\Q
and \\E
), but the way the method is defined, your Java implementation could treat it as equivalent to this: (也就是说,它添加了一个迷路\\Q
和\\E
),但是定义方法的方式,您的Java实现可以将其视为等效于此:
String source = "\\{% assign foo = values %\\}\\.\\{\\{ foo\\[0\\] \\}\\}\\.";
(that is, inserting lots of backslashes). (也就是说,插入大量的反斜杠)。
Your regex itself seems fine. 你的正则表达式看起来很好。 This program: 这个程序:
public static void main(final String... args)
{
final Pattern p = Pattern.compile("(?-mix:((?-mix:(?-mix:\\{\\%).*?(?-mix:\\%\\})|(?-mix:\\{\\{).*?(?-mix:\\}\\}?))|(?-mix:\\{\\{|\\{\\%)))");
for(final String s : p.split("a{%b%}c{{d}}e{%f%}g{{h}}i{{j{%k"))
System.out.println(s);
}
prints 版画
a
c
e
g
i
j
k
that is, it successfully treats {%b%}
, {{d}}
, {%f%}
, {{h}}
, {{
, and {%
as split-points, with all the non-greediness you'd expect. 也就是说,它成功地处理{%b%}
, {{d}}
, {%f%}
, {{h}}
, {{
和{%
作为分裂点,所有非贪婪你都是期望。 But tor the record, it also works if I strip p
down to just 但要记录下来,如果我将p
剥离到公正,它也会起作用
Pattern.compile("\\{%.*?%\\}|\\{\\{.*?\\}\\}?|\\{\\{|\\{%");
;-) ;-)
使用\\\\{
而不是\\{
和其他符号
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.