简体   繁体   English

在 .NET 中使用正则表达式从字符串中提取标记

[英]Extracting tokens from a string with regular expressions in .NET

I'm curious if this is even possible with Regex.我很好奇这是否可以使用正则表达式。 I want to extract tokens from a string similar to:我想从类似于以下的字符串中提取标记:

Select a [COLOR] and a [SIZE].

Ok, easy enough - I can use (\[[AZ]+\])好的,很简单 - 我可以使用(\[[AZ]+\])

However, I want to also extract the text between the tokens.但是,我还想提取标记之间的文本。 Basically, I want the matched groups for the above to be:基本上,我希望上述匹配组为:

"Select a "
"[COLOR]"
" and a "
"[SIZE]"
"."

What's the best approach for this?最好的方法是什么? If there's a way to do this with RegEx, that would be great.如果有办法用 RegEx 做到这一点,那就太好了。 Otherwise, I'm guessing I have to extract the tokens, then manually loop through the MatchCollection and parse out the substrings based on the indexes and lengths of each Match.否则,我猜我必须提取标记,然后手动循环 MatchCollection 并根据每个匹配的索引和长度解析出子字符串。 Please note I need to preserve the order of the strings and tokens.请注意,我需要保留字符串和标记的顺序。 Is there a better algorithm to do this sort of string parsing?有没有更好的算法来做这种字符串解析?

Use Regex.Split(s, @"(\[[AZ]+\])") - it should give you the exact array you're after.使用Regex.Split(s, @"(\[[AZ]+\])") - 它应该为您提供所需的确切数组。 Split takes captured groups and converts them to tokens in the result array. Split 获取捕获的组并将它们转换为结果数组中的标记。

Here is a method without using regular expressions ( Regex ) that uses String.Split , but you lose the delimiters.这是一种不使用正则表达式 ( Regex ) 的方法,它使用String.Split ,但您会丢失分隔符。

        string s = "Select a [COLOR] and a [SIZE].";

        string[] sParts = s.Split('[', ']');

        foreach (string sPart in sParts)
        {
            Debug.WriteLine(sPart);
        }

        // Select a 
        // COLOR
        //  and a 
        // SIZE
        // .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM