[英]Extracting tokens from a string with regular expressions in .NET
I'm curious if this is even possible with Regex.我很好奇这是否可以使用正则表达式。 I want to extract tokens from a string similar to:
我想从类似于以下的字符串中提取标记:
Select a [COLOR] and a [SIZE].
Ok, easy enough - I can use (\[[AZ]+\])
好的,很简单 - 我可以使用
(\[[AZ]+\])
However, I want to also extract the text between the tokens.但是,我还想提取标记之间的文本。 Basically, I want the matched groups for the above to be:
基本上,我希望上述匹配组为:
"Select a "
"[COLOR]"
" and a "
"[SIZE]"
"."
What's the best approach for this?最好的方法是什么? If there's a way to do this with RegEx, that would be great.
如果有办法用 RegEx 做到这一点,那就太好了。 Otherwise, I'm guessing I have to extract the tokens, then manually loop through the MatchCollection and parse out the substrings based on the indexes and lengths of each Match.
否则,我猜我必须提取标记,然后手动循环 MatchCollection 并根据每个匹配的索引和长度解析出子字符串。 Please note I need to preserve the order of the strings and tokens.
请注意,我需要保留字符串和标记的顺序。 Is there a better algorithm to do this sort of string parsing?
有没有更好的算法来做这种字符串解析?
Use Regex.Split(s, @"(\[[AZ]+\])")
- it should give you the exact array you're after.使用
Regex.Split(s, @"(\[[AZ]+\])")
- 它应该为您提供所需的确切数组。 Split takes captured groups and converts them to tokens in the result array. Split 获取捕获的组并将它们转换为结果数组中的标记。
Here is a method without using regular expressions ( Regex
) that uses String.Split
, but you lose the delimiters.这是一种不使用正则表达式 (
Regex
) 的方法,它使用String.Split
,但您会丢失分隔符。
string s = "Select a [COLOR] and a [SIZE].";
string[] sParts = s.Split('[', ']');
foreach (string sPart in sParts)
{
Debug.WriteLine(sPart);
}
// Select a
// COLOR
// and a
// SIZE
// .
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.