I need to convert a string like,
"[1,2,3,4][5,6,7,8]"
into groups of integers, adjusted to be zero based rather than one based:
{0,1,2,3} {4,5,6,7}
The following rules also apply:
Since I'm not that experienced with regular expressions, I'm currently using two;
@"^(?:\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\])+$";
and
@"\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\]";
I'm using the first one to check the input and the second to get all matches of a set of numbers inside square brackets.
I'm then using .Net string manipulation to trim off the square brackets and extract the numbers, parsing them and subtracting 1 to get the result I need.
I was wondering if I could get at the numbers better by using captures, but not sure how they work.
Final Solution:
In the end I used the following regular expression to validate the input string
@"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$"
agent-j's pattern is fine for capturing the information needed but also matches a string like "[1,2,3,4][5]" and would require me to do some additional filtering of the results.
I access the captures via the named group 'set' and use a second simple regex to extract the numbers.
The '[1-9]\\d{0,7}' simplifies parsing ints by limiting numbers to 99,999,999 and avoiding overflow exceptions.
MatchCollection matches = new Regex(@"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$").Matches(inputText);
if (matches.Count != 1)return;
CaptureCollection captures = matches[0].Groups["set"].Captures;
var resultJArray = new int[captures.Count][];
var numbersRegex = new Regex(@"\d+");
for (int captureIndex = 0; captureIndex < captures.Count; captureIndex++)
{
string capture = captures[captureIndex].Value;
MatchCollection numberMatches = numbersRegex.Matches(capture);
resultJArray [captureIndex] = new int[numberMatches.Count];
for (int numberMatchIndex = 0; numberMatchIndex < numberMatches.Count; numberMatchIndex++)
{
string number = numberMatches[numberMatchIndex].Value;
int numberAdjustedToZeroBase = Int32.Parse(number) - 1;
resultJArray [captureIndex][numberMatchIndex] = numberAdjustedToZeroBase;
}
}
string input = "[1,2,3,4][5,6,7,8][534,63433,73434,8343434]";
string pattern = @"\G(?:\[(?:(\d+)(?:,|(?=\]))){2,}\])";//\])+$";
MatchCollection matches = Regex.Matches (input, pattern);
To start out, any (regex)
with plain parenthasis is a capturing group. This means that the regex engine will capture (store positions matched by that group). To avoid this (when you don't need it, use (?:regex)
. I did that above.
Index 0 is special and it means the whole of the parent. IE match.Groups[0].Value is always the same as match.Value and match.Groups[0].Captures[0].Value. So, you can consider the Groups and Capture collections to start at index 1.
As you can see below, each match contains a bracketed digit group. You'll want to use captures 1-n from Group 1 of each match.
foreach (Match match in matches)
{
// [1,2]
// use captures 1-n from the first group.
for (int i = 1; i < match.Group[1].Captures.Count; i++)
{
int number = int.Parse(match.Group[1].Captures[i]);
if (number == 0)
throw new Exception ("Cannot be 0.");
}
}
Match[0] => [1,2,3,4]
Group[0] => [1,2,3,4]
Capture[0] => [1,2,3,4]
Group[1] => 4
Capture[0] => 1
Capture[1] => 2
Capture[2] => 3
Capture[3] => 4
Match[1] => [5,6,7,8]
Group[0] => [5,6,7,8]
Capture[0] => [5,6,7,8]
Group[1] => 8
Capture[0] => 5
Capture[1] => 6
Capture[2] => 7
Capture[3] => 8
Match[2] => [534,63433,73434,8343434]
Group[0] => [534,63433,73434,8343434]
Capture[0] => [534,63433,73434,8343434]
Group[1] => 8343434
Capture[0] => 534
Capture[1] => 63433
Capture[2] => 73434
Capture[3] => 8343434
The \\G
causes the match to begin at the start of the last match (so you won't match [1,2] [3,4]
). The {2,}
satisfies your requirement that there be at least 2 numbers per match.
The expression will match even if there is a 0. I suggest that you put that validation in with the other non-regex stuff. It will keep the regex simpler.
The following regex will validate and also spit out match groups of the bracketed [] group and also the inside that, each number
(?:([1-9][0-9]*)\,?){2,}
[1][5] - fail
[1] - fail
[] - fail
[a,b,c][5] - fail
[1,2,3,4] - pass
[1,2,3,4,5,6,7,8][5,6,7,8] - pass
[1,2,3,4][5,6,7,8][534,63433,73434,8343434] - pass
那么\\d+
和全球旗帜呢?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.