[英]How do I use regex to split only on commas not in angle brackets?
I have the string DobuleGeneric<DoubleGeneric<int,string>,string>
我有字符串
DobuleGeneric<DoubleGeneric<int,string>,string>
I am trying to grab the 2 type arguments: DoubleGeneric<int,string>
and string
我正在尝试获取2个类型参数:
DoubleGeneric<int,string>
和string
Initially I was using a split on ','. 最初,我在','上使用了拆分。 This worked, but only if the generic args are not themeselves generic.
这是可行的,但是仅当泛型参数不是主题本身泛型时。
My Code: 我的代码:
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string[] innerTypes = m.Groups[2].Value.Split( ',' );
foreach( string strInnerType in innerTypes ) {
Console.WriteLine( strInnerType );
}
Question: How do I do a regex split on commas that are not encapsulated in angle brackets? 问题:如何对未封装在尖括号中的逗号进行正则表达式拆分?
Both commas are between angle brackets! 两个逗号都在尖括号之间! Regex does a bad job when parsing a complex nested syntax.
正则表达式在解析复杂的嵌套语法时做得不好。 The question should be, how to find a comma, which is between angle brackets that are themselves not between angle brackets.
问题应该是,如何找到一个逗号,该逗号位于尖括号之间,而不是尖括号之间。 I don't think that this can be done with regex.
我认为使用正则表达式无法做到这一点。
If possible, try to work with Reflection. 如果可能,请尝试使用反射。 You might also use CS-Script to compile your code snippet and then use Reflection to retrieve the information you need.
您也可以使用CS-Script编译代码段,然后使用Reflection来检索所需的信息。
To split the example you have given you can use the following. 要拆分给出的示例,可以使用以下示例。 However, this is not generic;
但是,这不是通用的。 it could be made generic based on the other strings that you expect.
可以根据您期望的其他字符串将其设为通用。 Depending on the variation of the strings you have, this method could get complex;
根据您使用的字符串的不同,此方法可能会变得很复杂。 but I would suggest that the use of Roslyn here is overkill...
但是我建议在这里使用罗斯林是过分的...
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex Reg =
new Regex(@"(?i)<\s*\p{L}+\s*<\s*\p{L}+\s*,\s*\p{L}+\s*>\s*,\s*\p{L}+\s*>");
Match m = Reg.Match(fullName);
string str = m.ToString().Trim(new char[] { '<', '>' });
Regex rr = new Regex(@"(?i),(?!.*>\s*)");
string[] strArr = rr.Split(str);
I hope this helps. 我希望这有帮助。
The answers are correct, using Regex is the wrong approach. 答案是正确的,使用Regex是错误的方法。
I ended up doing a linear pass, replacing items encapsulated in brackets with ~
s, and then doing a split. 最后我做一个线性调整,更换封装在括号中的项
~
s,然后做了分裂。
static void Main( string[] args ) {
string fullName = "Outer<blah<int,string>,int,blah<int,int>>";
Regex regex = new Regex( @"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string inner = m.Groups[2].Value;
var genArgs = ParseInnerGenericArgs( inner );
foreach( string s in genArgs ) {
Console.WriteLine(s);
}
Console.ReadKey();
}
private static IEnumerable<string> ParseInnerGenericArgs( string inner ) {
List<string> pieces = new List<string>();
int angleCount = 0;
StringBuilder sb = new StringBuilder();
for( int i = 0; i < inner.Length; i++ ) {
string currChar = inner[i].ToString();
if( currChar == ">" ) {
angleCount--;
}
if( currChar == "<" ) {
angleCount++;
}
if( currChar == "," && angleCount > 0 ) {
sb.Append( "~" );
} else {
sb.Append( currChar );
}
}
foreach( string item in sb.ToString().Split( ',' ) ) {
pieces.Add(item.Replace('~',','));
}
return pieces;
}
Here is the regex I will use: 这是我将使用的正则表达式:
\<(([\w\.]+)(\<.+\>)?)\,(([\w\.]+)(\<.+\>)?)$
([\\w\\.]+)
matches "DoubleGeneric". ([\\w\\.]+)
匹配“ DoubleGeneric”。 (\\<.+\\>)?
matches the possible generic args like DoubleGeneric <OtherGeneric<int, ...>>
匹配可能的通用参数,例如DoubleGeneric
<OtherGeneric<int, ...>>
The key point is that no matter how many nested generic args you have you will have only one ">," in the whole expression. 关键是,无论您有多少个嵌套的泛型参数,整个表达式中都只有一个“>”。
You can use m.Gruops[1] and m.Groups[4] to get the first and second Type. 您可以使用m.Gruops [1]和m.Groups [4]获取第一个和第二个Type。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.