简体   繁体   English

如何用正则表达式表示可选组?

[英]How to represent optional group with regex?

I am trying to use C# to parse out text using a regex.我正在尝试使用 C# 使用正则表达式解析文本。

I have the following text example 1我有以下文本示例 1

Fn.If(first condition) 
   When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
   When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
   When the third condition is valid! This is an optional section
Fn.Else
    Catch all! This is an optional section
Fn.End

I want to be able to extract each section into 3 groups so the end result looks something like this我希望能够将每个部分提取为 3 组,因此最终结果看起来像这样

  • (Group 1A): Fn.If (第 1A 组):Fn.If
  • (Group 1B): first condition (第 1B 组):第一个条件
  • (Group 1C): When the first condition is valid! (Group 1C):当第一个条件成立时! This is a required section这是必填部分
  • (Group 2A): Fn.ElseIf (第 2A 组):Fn.ElseIf
  • (Group 2B): second condition (2B 组):第二个条件
  • (Group 2C): When the second condition is valid! (Group 2C):当第二个条件成立时! This is an optional section这是一个可选部分
  • (Group 3A): Fn.ElseIf (第 3A 组):Fn.ElseIf
  • (Group 3B): third condition (3B 组):第三个条件
  • (Group 3C): When the third condition is valid! (3C组):当第三个条件成立时! This is an optional section这是一个可选部分
  • (Group 4A): Fn.Else (4A 组):Fn.Else
  • (Group 4B): Catch all! (4B 组):抓住一切! This is an optional section这是一个可选部分
  • (Group C): Fn.End (C组):Fn.End

As you can see from the comments, Group 1(A/B/C) must exist along with the last group for the pattern to be valid.正如您从评论中看到的那样,第 1 组(A/B/C)必须与最后一个组一起存在,才能使模式有效。 However, all the groups in between are optional meaning they could exists or maybe not.但是,介于两者之间的所有组都是可选的,这意味着它们可能存在也可能不存在。

In addition to the text example above, the pattern should be able to parse the following text example 2除了上面的文本示例,模式应该能够解析下面的文本示例 2

Fn.If(first condition) 
   When first condition is valid! This is a required section
Fn.EndIf

or text example 3或文字示例 3

Fn.If(first condition) 
   When first condition is valid! This is a required section
Fn.Else
    Catch all! This is an optional section
Fn.EndIf

I am able to do this我能够做到这一点

  1. (Fn\.If\s*)\((.+?)\)([\s\S]+)(Fn\.EndIf) works with text example 2 (Fn\.If\s*)\((.+?)\)([\s\S]+)(Fn\.EndIf)适用于文本示例 2
  2. (Fn\.ElseIf\s*)\((.+?)\)([\s\S]+) will return the Fn.ElseIf(...).... group (Fn\.ElseIf\s*)\((.+?)\)([\s\S]+)将返回Fn.ElseIf(...)....
  3. (Fn\.Else)([\s\S]+) will capture the Fn.Else..... groups (Fn\.Else)([\s\S]+)将捕获Fn.Else.....

However, I am struggling to put all 3 patterns togather while idecating that line 2 can have zero or more groups, followed by one or none of line 3.但是,我正在努力将所有 3 种模式放在一起,同时认为第 2 行可以有零个或多个组,然后是第 3 行中的一个或没有。

I tried the following which isn't working.我尝试了以下不起作用。 To make it easier to read, I added a new line after each group for the sake of the question only.为了更容易阅读,我在每个组之后添加了一个新行,只是为了这个问题。

(Fn\.If\s*)\((.+?)\)([\s\S]+)
((Fn\.ElseIf\s*)\((.+?)\)([\s\S]+))*
((Fn\.Else)([\s\S]+))?
(Fn\.EndIf)

I felt using a single monolithic Regex would make things too complicated - so here's a finite-state-machine based approach that still uses Regexes to capture each line.我觉得使用单个单一的正则表达式会使事情变得过于复杂 - 所以这是一种基于有限状态机的方法,它仍然使用正则表达式来捕获每一行。

void Main()
{
    const String input = 
@"Fn.If(first condition)
   When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
   When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
   When the third condition is valid! This is an optional section
Fn.Else
    Catch all! This is an optional section
Fn.End  
    ";

    Regex rIf     = new Regex( @"^Fn\.If\((.+)\)\s*$" );
    Regex rElseIf = new Regex( @"^Fn\.ElseIf\((.+)\)\s*$" );
    Regex rElse   = new Regex( @"^Fn\.Else\s*$" );
    Regex rEnd    = new Regex( @"^Fn\.End\s*$" );

    String[] lines = input.Split(new String[] { "\r\n" }, StringSplitOptions.None );

    List<Statement> statements = new List<Statement>();

    String type = null;
    String condition = null;
    StringBuilder sb = new StringBuilder();

    State state = State.Outside;
    foreach( String line in lines )
    {
        switch( state )
        {
        case State.Outside:

            Match mIf = rIf.Match( line );
            if( mIf.Success )
            {
                type = "Fn.If";
                condition = mIf.Groups[1].Value;

                state = State.InIf;
            }

            break;
        case State.InIf:
        case State.InElseIf:

            Match mElseIf = rElseIf.Match( line );
            if( mElseIf.Success )
            {
                statements.Add( new Statement( type, condition, sb.ToString() ) );
                sb.Length = 0;

                state = State.InElseIf;
                type = "Fn.ElseIf";
                condition = mElseIf.Groups[1].Value;
            }
            else
            {
                Match mElse = rElse.Match( line );
                if( mElse.Success )
                {
                    statements.Add(new Statement(type, condition, sb.ToString()));
                    sb.Length = 0;

                    state = State.InElse;
                    type = "Fn.Else";
                    condition = null;
                }
                else
                {
                    sb.Append( line );
                }
            }

            break;

        case State.InElse:

            Match mEnd = rEnd.Match(line);
            if (mEnd.Success)
            {
                statements.Add(new Statement(type, condition, sb.ToString()));
                sb.Length = 0;

                state = State.Outside;
                type = null;
                condition = null;
            }
            else
            {
                sb.Append( line );
            }

            break;
        }
    }

    statements.Dump();
}

class Statement
{
    public Statement( String type, String condition, String contents )
    {
        this.Type = type;
        this.Condition = condition;
        this.Contents = contents;
    }

    public String Type { get; }
    public String Condition { get; }
    public String Contents { get; }
}

// Define other methods and classes here
enum State
{
    Outside,
    InIf,
    InElseIf,
    InElse
}

Running in Linqpad gives me this output:在 Linqpad 中运行给了我这个 output:

在此处输入图像描述

It's doable with one regex一个正则表达式就可以了

This is a python version of the regex, but it should be translatable to C#这是正则表达式的 python 版本,但它应该可以翻译为 C#

key is to use same capture group for all matches关键是对所有匹配项使用相同的捕获组

(Fn\.[A-Za-z]+[^\(\n]*)((\((.+?)\)(?<=\)))?([\s\S]*?)(?=Fn\.))?

tested with all 3 examples使用所有 3 个示例进行测试

online preview: https://regex101.com/r/VqNlMm/1在线预览: https://regex101.com/r/VqNlMm/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM