简体   繁体   English

Regex.Split()奇怪的行为

[英]Regex.Split() strange behaviour

I tried following regex to split data in a text file, but I found a strange bug during testing - pretty simple file was spitted clearly incorrect. 我尝试使用正则表达式将数据拆分到文本文件中,但我在测试过程中发现了一个奇怪的错误 - 非常简单的文件显然是错误的。 Sample code to illustrate such behavior: 用于说明此类行为的示例代码:

        const string line = "511525,3122,9,39,2007,9,39,3127,9,39,\" -49,368.11 \",\"-32,724.16\",2,1,\" 2,347.91 \", -   ,\" 2,234.17 \", -   ,2.2,1.143,2,1.24,FALSE,1,2,0,311,511625";
        const string pattern = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)";

        Console.WriteLine();
        Console.WriteLine("SPLIT");
        var splitted = Regex.Split(line, pattern, RegexOptions.Compiled);
        foreach (var s in splitted)
        {
            Console.WriteLine(s);
        }

        Console.WriteLine();
        Console.WriteLine("REPLACE");
        var replaced = Regex.Replace(line, pattern, "!" , RegexOptions.Compiled);
        Console.WriteLine(replaced);

        Console.WriteLine();
        Console.WriteLine("MATCH");
        var matches = Regex.Matches(line, pattern);
        foreach (Match match in matches)
        {
            Console.WriteLine(match.Index);
        }

So, as you can see, split is the only method which produces unexpected results(it splits on invalid positions!)!Both Matches and Replace give absolutely correct results. 所以,正如你所看到的,split是产生意外结果的唯一方法(它在无效位置上分裂!)! MatchesReplace给出了绝对正确的结果。 I even tried to test mentioned regex in RegexBuddy, and it showed same matches as Regex.Matches ! 我甚至试图在RegexBuddy中测试提到的正则表达式,它显示与Regex.Matches相同的匹配! Am i missing something or it looks like a bug in Split method? 我错过了什么或看起来像Split方法中的错误?

Console output : 控制台输出

SPLIT
511525
, -   ," 2,234.17 "
3122
, -   ," 2,234.17 "
9
, -   ," 2,234.17 "
39
, -   ," 2,234.17 "
2007
, -   ," 2,234.17 "
9
, -   ," 2,234.17 "
39
, -   ," 2,234.17 "
3127
, -   ," 2,234.17 "
9
, -   ," 2,234.17 "
39
, -   ," 2,234.17 "
" -49,368.11 "
, -   ," 2,234.17 "
"-32,724.16"
, -   ," 2,234.17 "
2
, -   ," 2,234.17 "
1
, -   ," 2,234.17 "
" 2,347.91 "
 -   ," 2,234.17 "
 -
" 2,234.17 "
" 2,234.17 "
 -
2.2
1.143
2
1.24
FALSE
1
2
0
311
511625

REPLACE
511525!3122!9!39!2007!9!39!3127!9!39!" -49,368.11 "!"-32,724.16"!2!1!" 2,347.91 "! -   !" 2,234.17 "! -   !2.2!1.143!2!1.24!FALSE!1!2!0!311!511625

MATCH
6
11
13
16
21
23
26
31
33
36
51
64
66
68
81
87
100
106
110
116
118
123
129
131
133
135
139

Solution from MS 来自MS的解决方案

(Adding ExplicitCapture regex option) (添加ExplicitCapture正则表达式选项)

Based on your response from Microsoft (add ExplicitCapture) it seems the problem is the capturing group. 根据您对Microsoft的回复(添加ExplicitCapture),问题似乎是捕获组。 The ExplicitCapture option would turn that capturing group into a non-capturing group ExplicitCapture选项会将捕获组转换为非捕获组

You can do the same without the option by making the group explicitly non-capturing: 您可以通过使组明确不捕获来执行相同的操作:

const string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)";

which, testing with LINQPad, seems to produce the results are looking for. 其中,用LINQPad测试,似乎产生了结果正在寻找。

Whether there are any capturing groups makes a difference as described in the docs for Regex.Split 是否有任何捕获组会产生差异,如Regex.Split的文档中所述

If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array. 如果在Regex.Split表达式中使用捕获括号,则任何捕获的文本都包含在结果字符串数组中。 For example, splitting the string " plum-pear" on a hyphen placed within capturing parentheses adds a string element that contains the hyphen to the returned array. 例如,将字符串“plum-pear”拆分到捕获括号内的连字符上会将包含连字符的字符串元素添加到返回的数组中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM