简体   繁体   中英

How to represent optional group with regex?

I am trying to use C# to parse out text using a regex.

I have the following text example 1

Fn.If(first condition) 
   When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
   When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
   When the third condition is valid! This is an optional section
Fn.Else
    Catch all! This is an optional section
Fn.End

I want to be able to extract each section into 3 groups so the end result looks something like this

  • (Group 1A): Fn.If
  • (Group 1B): first condition
  • (Group 1C): When the first condition is valid! This is a required section
  • (Group 2A): Fn.ElseIf
  • (Group 2B): second condition
  • (Group 2C): When the second condition is valid! This is an optional section
  • (Group 3A): Fn.ElseIf
  • (Group 3B): third condition
  • (Group 3C): When the third condition is valid! This is an optional section
  • (Group 4A): Fn.Else
  • (Group 4B): Catch all! This is an optional section
  • (Group C): Fn.End

As you can see from the comments, Group 1(A/B/C) must exist along with the last group for the pattern to be valid. However, all the groups in between are optional meaning they could exists or maybe not.

In addition to the text example above, the pattern should be able to parse the following text example 2

Fn.If(first condition) 
   When first condition is valid! This is a required section
Fn.EndIf

or text example 3

Fn.If(first condition) 
   When first condition is valid! This is a required section
Fn.Else
    Catch all! This is an optional section
Fn.EndIf

I am able to do this

  1. (Fn\.If\s*)\((.+?)\)([\s\S]+)(Fn\.EndIf) works with text example 2
  2. (Fn\.ElseIf\s*)\((.+?)\)([\s\S]+) will return the Fn.ElseIf(...).... group
  3. (Fn\.Else)([\s\S]+) will capture the Fn.Else..... groups

However, I am struggling to put all 3 patterns togather while idecating that line 2 can have zero or more groups, followed by one or none of line 3.

I tried the following which isn't working. To make it easier to read, I added a new line after each group for the sake of the question only.

(Fn\.If\s*)\((.+?)\)([\s\S]+)
((Fn\.ElseIf\s*)\((.+?)\)([\s\S]+))*
((Fn\.Else)([\s\S]+))?
(Fn\.EndIf)

I felt using a single monolithic Regex would make things too complicated - so here's a finite-state-machine based approach that still uses Regexes to capture each line.

void Main()
{
    const String input = 
@"Fn.If(first condition)
   When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
   When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
   When the third condition is valid! This is an optional section
Fn.Else
    Catch all! This is an optional section
Fn.End  
    ";

    Regex rIf     = new Regex( @"^Fn\.If\((.+)\)\s*$" );
    Regex rElseIf = new Regex( @"^Fn\.ElseIf\((.+)\)\s*$" );
    Regex rElse   = new Regex( @"^Fn\.Else\s*$" );
    Regex rEnd    = new Regex( @"^Fn\.End\s*$" );

    String[] lines = input.Split(new String[] { "\r\n" }, StringSplitOptions.None );

    List<Statement> statements = new List<Statement>();

    String type = null;
    String condition = null;
    StringBuilder sb = new StringBuilder();

    State state = State.Outside;
    foreach( String line in lines )
    {
        switch( state )
        {
        case State.Outside:

            Match mIf = rIf.Match( line );
            if( mIf.Success )
            {
                type = "Fn.If";
                condition = mIf.Groups[1].Value;

                state = State.InIf;
            }

            break;
        case State.InIf:
        case State.InElseIf:

            Match mElseIf = rElseIf.Match( line );
            if( mElseIf.Success )
            {
                statements.Add( new Statement( type, condition, sb.ToString() ) );
                sb.Length = 0;

                state = State.InElseIf;
                type = "Fn.ElseIf";
                condition = mElseIf.Groups[1].Value;
            }
            else
            {
                Match mElse = rElse.Match( line );
                if( mElse.Success )
                {
                    statements.Add(new Statement(type, condition, sb.ToString()));
                    sb.Length = 0;

                    state = State.InElse;
                    type = "Fn.Else";
                    condition = null;
                }
                else
                {
                    sb.Append( line );
                }
            }

            break;

        case State.InElse:

            Match mEnd = rEnd.Match(line);
            if (mEnd.Success)
            {
                statements.Add(new Statement(type, condition, sb.ToString()));
                sb.Length = 0;

                state = State.Outside;
                type = null;
                condition = null;
            }
            else
            {
                sb.Append( line );
            }

            break;
        }
    }

    statements.Dump();
}

class Statement
{
    public Statement( String type, String condition, String contents )
    {
        this.Type = type;
        this.Condition = condition;
        this.Contents = contents;
    }

    public String Type { get; }
    public String Condition { get; }
    public String Contents { get; }
}

// Define other methods and classes here
enum State
{
    Outside,
    InIf,
    InElseIf,
    InElse
}

Running in Linqpad gives me this output:

在此处输入图像描述

It's doable with one regex

This is a python version of the regex, but it should be translatable to C#

key is to use same capture group for all matches

(Fn\.[A-Za-z]+[^\(\n]*)((\((.+?)\)(?<=\)))?([\s\S]*?)(?=Fn\.))?

tested with all 3 examples

online preview: https://regex101.com/r/VqNlMm/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM