简体   繁体   English

如何使用正则表达式仅在逗号上分割而不在尖括号中?

[英]How do I use regex to split only on commas not in angle brackets?

I have the string DobuleGeneric<DoubleGeneric<int,string>,string> 我有字符串DobuleGeneric<DoubleGeneric<int,string>,string>

I am trying to grab the 2 type arguments: DoubleGeneric<int,string> and string 我正在尝试获取2个类型参数: DoubleGeneric<int,string>string

Initially I was using a split on ','. 最初,我在','上使用了拆分。 This worked, but only if the generic args are not themeselves generic. 这是可行的,但是仅当泛型参数不是主题本身泛型时。

My Code: 我的代码:

string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string[] innerTypes = m.Groups[2].Value.Split( ',' );

foreach( string strInnerType in innerTypes ) {
        Console.WriteLine( strInnerType );
}

Question: How do I do a regex split on commas that are not encapsulated in angle brackets? 问题:如何对未封装在尖括号中的逗号进行正则表达式拆分?

Both commas are between angle brackets! 两个逗号都在尖括号之间! Regex does a bad job when parsing a complex nested syntax. 正则表达式在解析复杂的嵌套语法时做得不好。 The question should be, how to find a comma, which is between angle brackets that are themselves not between angle brackets. 问题应该是,如何找到一个逗号,该逗号位于尖括号之间,而不是尖括号之间。 I don't think that this can be done with regex. 我认为使用正则表达式无法做到这一点。

If possible, try to work with Reflection. 如果可能,请尝试使用反射。 You might also use CS-Script to compile your code snippet and then use Reflection to retrieve the information you need. 您也可以使用CS-Script编译代码段,然后使用Reflection来检索所需的信息。

To split the example you have given you can use the following. 要拆分给出的示例,可以使用以下示例。 However, this is not generic; 但是,这不是通用的。 it could be made generic based on the other strings that you expect. 可以根据您期望的其他字符串将其设为通用。 Depending on the variation of the strings you have, this method could get complex; 根据您使用的字符串的不同,此方法可能会变得很复杂。 but I would suggest that the use of Roslyn here is overkill... 但是我建议在这里使用罗斯林是过分的...

string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>"; 
Regex Reg = 
    new Regex(@"(?i)<\s*\p{L}+\s*<\s*\p{L}+\s*,\s*\p{L}+\s*>\s*,\s*\p{L}+\s*>");
Match m = Reg.Match(fullName);
string str = m.ToString().Trim(new char[] { '<', '>' });
Regex rr = new Regex(@"(?i),(?!.*>\s*)");
string[] strArr = rr.Split(str);

I hope this helps. 我希望这有帮助。

The answers are correct, using Regex is the wrong approach. 答案是正确的,使用Regex是错误的方法。

I ended up doing a linear pass, replacing items encapsulated in brackets with ~ s, and then doing a split. 最后我做一个线性调整,更换封装在括号中的项~ s,然后做了分裂。

static void Main( string[] args ) {

    string fullName = "Outer<blah<int,string>,int,blah<int,int>>";          

    Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
    Match m = regex.Match( fullName );
    string frontName = m.Groups[1].Value;
    string inner = m.Groups[2].Value;

    var genArgs = ParseInnerGenericArgs( inner );

    foreach( string s in genArgs ) {
        Console.WriteLine(s);
    }
    Console.ReadKey();
}

private static IEnumerable<string> ParseInnerGenericArgs( string inner ) {
    List<string> pieces = new List<string>();
    int angleCount = 0;
    StringBuilder sb = new StringBuilder();
    for( int i = 0; i < inner.Length; i++ ) {
        string currChar = inner[i].ToString();
        if( currChar == ">" ) {
            angleCount--;
        }
        if( currChar == "<" ) {
            angleCount++;
        }
        if( currChar == ","  &&  angleCount > 0 ) {

            sb.Append( "~" );

        } else {
            sb.Append( currChar );
        }

    }
    foreach( string item in sb.ToString().Split( ',' ) ) {
        pieces.Add(item.Replace('~',','));
    }
    return pieces;
}

Here is the regex I will use: 这是我将使用的正则表达式:

\<(([\w\.]+)(\<.+\>)?)\,(([\w\.]+)(\<.+\>)?)$

([\\w\\.]+) matches "DoubleGeneric". ([\\w\\.]+)匹配“ DoubleGeneric”。 (\\<.+\\>)? matches the possible generic args like DoubleGeneric <OtherGeneric<int, ...>> 匹配可能的通用参数,例如DoubleGeneric <OtherGeneric<int, ...>>

The key point is that no matter how many nested generic args you have you will have only one ">," in the whole expression. 关键是,无论您有多少个嵌套的泛型参数,整个表达式中都只有一个“>”。

You can use m.Gruops[1] and m.Groups[4] to get the first and second Type. 您可以使用m.Gruops [1]和m.Groups [4]获取第一个和第二个Type。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM