简体   繁体   中英

Using Regex to split a string in C#

I need to split a string from another system, which represents a serialized object. the object itself could have another object of the same type nested as a property. I need a way to essentially serialize the string into a string array. for example.

"{1,Dave,2}" should create a string array with 3 elements "1", "Dave", "2" .

"{1,{Cat,Yellow},2}" should become an array with 3 elements "1", "{Cat,Yellow}", "2" .

"{1,{Cat,{Blue,1}},2}" should become an array with 3 elements "1", "{Cat,{Blue,1}}", "2" .

Basically the nesting could be N level deep, so potentially, I could have something like "{{Cat,{Blue,1}},{Dog,White}}" and my resulting array should have 2 elements: "{Cat,{Blue,1}}" and "{Dog,White}"

I thought of writing a custom parser to parse the string manually. But this seems like the kind of problems RegEx was designed to solve, however, I'm not very good with regex, hence would appreciate some pointers from the RegEx pros out there.

Thanks

Well, you can use this split which makes use of balancing groups :

,(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

It will match a comma that has no {} ahead, or groups within {} .

In code:

string msg= "{1,{Cat,{Blue,1}},2}";
msg = msg.Substring(1, msg.Length - 2);
string[] charSetOccurences = Regex.Split(msg, @",(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)");
foreach (string s in charSetOccurences)
{
    Console.WriteLine(s);
}

Output:

1
{Cat,{Blue,1}}
2

ideone demo


Brief explanation:

(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

Is a huge lookahead...

[^{}]* will match any characters except {} any number of times.

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!)) will match {} groups with any level of nesting.

It will first catch an opening { and name it O (I chose it to mean 'opening') here:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
           ^

Then any characters except braces:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
             ^^^^^^

And repeat that group to accommodate nesting:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                    ^

This part balances the opening brace:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                        ^^^^^^^^

With other non {} and repeat to cater for the nestings:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                ^^^^^^^ ^

All this, at least 0 times:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                          ^

The last conditional negative lookahead is just a closure and ensure there's no unbalanced braces.

它不是一个Split ,但如果你使用以下表达式Match你将得到一个失败的匹配或一个与你的个人值在m.Groups[1].Captures

^\{(?:((?:[^{}]|\{(?<Depth>)|\}(?<-Depth>))*?)(?:,(?(Depth)(?!))|\}$))*$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM