I tried following regex to split data in a text file, but I found a strange bug during testing - pretty simple file was spitted clearly incorrect. Sample code to illustrate such behavior:
const string line = "511525,3122,9,39,2007,9,39,3127,9,39,\" -49,368.11 \",\"-32,724.16\",2,1,\" 2,347.91 \", - ,\" 2,234.17 \", - ,2.2,1.143,2,1.24,FALSE,1,2,0,311,511625";
const string pattern = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)";
Console.WriteLine();
Console.WriteLine("SPLIT");
var splitted = Regex.Split(line, pattern, RegexOptions.Compiled);
foreach (var s in splitted)
{
Console.WriteLine(s);
}
Console.WriteLine();
Console.WriteLine("REPLACE");
var replaced = Regex.Replace(line, pattern, "!" , RegexOptions.Compiled);
Console.WriteLine(replaced);
Console.WriteLine();
Console.WriteLine("MATCH");
var matches = Regex.Matches(line, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Index);
}
So, as you can see, split is the only method which produces unexpected results(it splits on invalid positions!)!Both Matches
and Replace
give absolutely correct results. I even tried to test mentioned regex in RegexBuddy, and it showed same matches as Regex.Matches
! Am i missing something or it looks like a bug in Split
method?
Console output :
SPLIT
511525
, - ," 2,234.17 "
3122
, - ," 2,234.17 "
9
, - ," 2,234.17 "
39
, - ," 2,234.17 "
2007
, - ," 2,234.17 "
9
, - ," 2,234.17 "
39
, - ," 2,234.17 "
3127
, - ," 2,234.17 "
9
, - ," 2,234.17 "
39
, - ," 2,234.17 "
" -49,368.11 "
, - ," 2,234.17 "
"-32,724.16"
, - ," 2,234.17 "
2
, - ," 2,234.17 "
1
, - ," 2,234.17 "
" 2,347.91 "
- ," 2,234.17 "
-
" 2,234.17 "
" 2,234.17 "
-
2.2
1.143
2
1.24
FALSE
1
2
0
311
511625
REPLACE
511525!3122!9!39!2007!9!39!3127!9!39!" -49,368.11 "!"-32,724.16"!2!1!" 2,347.91 "! - !" 2,234.17 "! - !2.2!1.143!2!1.24!FALSE!1!2!0!311!511625
MATCH
6
11
13
16
21
23
26
31
33
36
51
64
66
68
81
87
100
106
110
116
118
123
129
131
133
135
139
(Adding ExplicitCapture regex option)
Based on your response from Microsoft (add ExplicitCapture) it seems the problem is the capturing group. The ExplicitCapture option would turn that capturing group into a non-capturing group
You can do the same without the option by making the group explicitly non-capturing:
const string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)";
which, testing with LINQPad, seems to produce the results are looking for.
Whether there are any capturing groups makes a difference as described in the docs for Regex.Split
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array. For example, splitting the string " plum-pear" on a hyphen placed within capturing parentheses adds a string element that contains the hyphen to the returned array.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.