在 .NET 中使用正则表达式从字符串中提取标记

Question

I'm curious if this is even possible with Regex.我很好奇这是否可以使用正则表达式。 I want to extract tokens from a string similar to:我想从类似于以下的字符串中提取标记：

Select a [COLOR] and a [SIZE].

Ok, easy enough - I can use (\[[AZ]+\])好的，很简单 - 我可以使用(\[[AZ]+\])

However, I want to also extract the text between the tokens.但是，我还想提取标记之间的文本。 Basically, I want the matched groups for the above to be:基本上，我希望上述匹配组为：

"Select a "
"[COLOR]"
" and a "
"[SIZE]"
"."

What's the best approach for this?最好的方法是什么？ If there's a way to do this with RegEx, that would be great.如果有办法用 RegEx 做到这一点，那就太好了。 Otherwise, I'm guessing I have to extract the tokens, then manually loop through the MatchCollection and parse out the substrings based on the indexes and lengths of each Match.否则，我猜我必须提取标记，然后手动循环 MatchCollection 并根据每个匹配的索引和长度解析出子字符串。 Please note I need to preserve the order of the strings and tokens.请注意，我需要保留字符串和标记的顺序。 Is there a better algorithm to do this sort of string parsing?有没有更好的算法来做这种字符串解析？

Answer 1

Use Regex.Split(s, @"(\[[AZ]+\])") - it should give you the exact array you're after.使用Regex.Split(s, @"(\[[AZ]+\])") - 它应该为您提供所需的确切数组。 Split takes captured groups and converts them to tokens in the result array. Split 获取捕获的组并将它们转换为结果数组中的标记。

Answer 2

Here is a method without using regular expressions ( Regex ) that uses String.Split , but you lose the delimiters.这是一种不使用正则表达式 ( Regex ) 的方法，它使用String.Split ，但您会丢失分隔符。

        string s = "Select a [COLOR] and a [SIZE].";

        string[] sParts = s.Split('[', ']');

        foreach (string sPart in sParts)
        {
            Debug.WriteLine(sPart);
        }

        // Select a 
        // COLOR
        //  and a 
        // SIZE
        // .

在 .NET 中使用正则表达式从字符串中提取标记

问题描述

2 个解决方案

解决方案1
11 已采纳 2011-05-02 08:29:23

解决方案2
0 2012-06-09 02:22:54

在 .NET 中使用正则表达式从字符串中提取标记

问题描述

2 个解决方案

解决方案1 11 已采纳 2011-05-02 08:29:23

解决方案2 0 2012-06-09 02:22:54

解决方案1
11 已采纳 2011-05-02 08:29:23

解决方案2
0 2012-06-09 02:22:54