简体   繁体   中英

How do I use regex to split only on commas not in angle brackets?

I have the string DobuleGeneric<DoubleGeneric<int,string>,string>

I am trying to grab the 2 type arguments: DoubleGeneric<int,string> and string

Initially I was using a split on ','. This worked, but only if the generic args are not themeselves generic.

My Code:

string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string[] innerTypes = m.Groups[2].Value.Split( ',' );

foreach( string strInnerType in innerTypes ) {
        Console.WriteLine( strInnerType );
}

Question: How do I do a regex split on commas that are not encapsulated in angle brackets?

Both commas are between angle brackets! Regex does a bad job when parsing a complex nested syntax. The question should be, how to find a comma, which is between angle brackets that are themselves not between angle brackets. I don't think that this can be done with regex.

If possible, try to work with Reflection. You might also use CS-Script to compile your code snippet and then use Reflection to retrieve the information you need.

To split the example you have given you can use the following. However, this is not generic; it could be made generic based on the other strings that you expect. Depending on the variation of the strings you have, this method could get complex; but I would suggest that the use of Roslyn here is overkill...

string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>"; 
Regex Reg = 
    new Regex(@"(?i)<\s*\p{L}+\s*<\s*\p{L}+\s*,\s*\p{L}+\s*>\s*,\s*\p{L}+\s*>");
Match m = Reg.Match(fullName);
string str = m.ToString().Trim(new char[] { '<', '>' });
Regex rr = new Regex(@"(?i),(?!.*>\s*)");
string[] strArr = rr.Split(str);

I hope this helps.

The answers are correct, using Regex is the wrong approach.

I ended up doing a linear pass, replacing items encapsulated in brackets with ~ s, and then doing a split.

static void Main( string[] args ) {

    string fullName = "Outer<blah<int,string>,int,blah<int,int>>";          

    Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
    Match m = regex.Match( fullName );
    string frontName = m.Groups[1].Value;
    string inner = m.Groups[2].Value;

    var genArgs = ParseInnerGenericArgs( inner );

    foreach( string s in genArgs ) {
        Console.WriteLine(s);
    }
    Console.ReadKey();
}

private static IEnumerable<string> ParseInnerGenericArgs( string inner ) {
    List<string> pieces = new List<string>();
    int angleCount = 0;
    StringBuilder sb = new StringBuilder();
    for( int i = 0; i < inner.Length; i++ ) {
        string currChar = inner[i].ToString();
        if( currChar == ">" ) {
            angleCount--;
        }
        if( currChar == "<" ) {
            angleCount++;
        }
        if( currChar == ","  &&  angleCount > 0 ) {

            sb.Append( "~" );

        } else {
            sb.Append( currChar );
        }

    }
    foreach( string item in sb.ToString().Split( ',' ) ) {
        pieces.Add(item.Replace('~',','));
    }
    return pieces;
}

Here is the regex I will use:

\<(([\w\.]+)(\<.+\>)?)\,(([\w\.]+)(\<.+\>)?)$

([\\w\\.]+) matches "DoubleGeneric". (\\<.+\\>)? matches the possible generic args like DoubleGeneric <OtherGeneric<int, ...>>

The key point is that no matter how many nested generic args you have you will have only one ">," in the whole expression.

You can use m.Gruops[1] and m.Groups[4] to get the first and second Type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM