简体   繁体   English

.net正则表达式以捕获组以及其他所有内容

[英].net Regex to capture groups plus everything else

I am trying to create a .net regex that will capture the whole string into different groups. 我正在尝试创建一个.net正则表达式,它将整个字符串捕获到不同的组中。 Capturing the groups is easy but capturing the rest is beyond me. 捕获组很容易,但是捕获其余的部分超出了我的范围。

The [BBCode] markers could happen anywhere in the string, or be the only thing, or not be present at all. [BBCode]标记可能出现在字符串中的任何地方,或者是唯一的,或者根本不存在。 There also maybe [ brackets ] in the string. 字符串中也可能有[方括号]。

Having group names would be a bonus. 具有组名将是一个奖励。

class Program
{
    static void Main(string[] args)
    {
        string input = "thinking [ of using ] BBCode format [A=16] and [E=2] here [V=8] and so on";
        string regexString = @"((\[A=[0-9]+\])|(\[E=[0-9]+\])|(\[V=[0-9]+\]))";
        MatchCollection matches = Regex.Matches(input, regexString);

        foreach (Match match in matches)
        {
            Console.WriteLine(match.Value);
        }
   }
}

The result I am after is (one group per line) 我得到的结果是(每行一组)

thinking [ of using ] BBCode format 想[使用] BBCode格式

[A=16] [A = 16]

and

[E=2] [E = 2]

here 这里

[V=8] [V = 8]

and so on 等等

    string input = "thinking of[ using BBCode format [A=16] here [E=2] and [V=8] and so on";
    var firstText= Regex.Match(input, @".+?(?=\[A)"); //Match until [A
    Console.WriteLine(firstText); //thinking of[ using BBCode format 
    input = Regex.Replace(input, @".+?(?=\[A)", "");
    var AValue = Regex.Match(input, @"\[A=[0-9]+\]"); //Match the value of A
    input = Regex.Replace(input, @"\[A=[0-9]+\] ", "");
    Console.WriteLine(AValue); //[A=16]
    var AText = Regex.Match(input, @".+?(?=\[)"); //Match the text after A
    Console.WriteLine(AText); // here

One huge regex is hard to understand, so I would just use some more lines on this. 一个巨大的正则表达式很难理解,因此我将在此使用更多行。 This for example matches the wanted text and then removes it from the input. 例如,这与所需文本匹配,然后将其从输入中删除。 This way you can capture the groups one by one and it's clear which part of the code captures which text, in case the regex needs to be modified in the future. 这样,您就可以一个一个地捕获组,并且很清楚,代码的哪一部分可以捕获哪个文本,以防将来需要修改正则表达式。

The regex itself is actually rather simple: 正则表达式本身实际上非常简单:

var input = "thinking [ of using ] BBCode format [A=16] and [E=2] here [V=8] and so on";

var pattern = @"^(?:(.*?)(\[[AEV]=\d+\]))*(.*?)$";

var match = Regex.Match(input, pattern);

The issue however is that you usually cannot capture variable count of groups. 但是,问题在于您通常无法捕获可变数量的组。 .NET supports this though, but you need to go through the groups and their captures to actually get all the parts you need. 虽然.NET支持此功能,但是您需要遍历各个组及其捕获,才能真正获得所需的所有部分。 The full code would look like this: 完整的代码如下所示:

var input = "thinking [ of using ] BBCode format [A=16] and [E=2] here [V=8] and so on";

var pattern = @"^(?:(.*?)(\[[AEV]=\d+\]))*(.*?)$";

var match = Regex.Match(input, pattern);

var captures = 
    match
        .Groups
        .OfType<Group>()
        .Skip(1) // first Group is the whole Match itself
        .SelectMany(g => g.Captures.OfType<Capture>())
        .OrderBy(c => c.Index); // order the captures by index to get them in appearance order, not in group order

foreach (var capture in captures)
{
    System.Console.WriteLine(capture.Value);
}

This can be easily extended to support group names (does not seem very valuable though) or other tags. 可以轻松地扩展它以支持组名(虽然看起来不太有价值)或其他标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM