I am trying to use C# to parse out text using a regex.
I have the following text example 1
Fn.If(first condition)
When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
When the third condition is valid! This is an optional section
Fn.Else
Catch all! This is an optional section
Fn.End
I want to be able to extract each section into 3 groups so the end result looks something like this
As you can see from the comments, Group 1(A/B/C) must exist along with the last group for the pattern to be valid. However, all the groups in between are optional meaning they could exists or maybe not.
In addition to the text example above, the pattern should be able to parse the following text example 2
Fn.If(first condition)
When first condition is valid! This is a required section
Fn.EndIf
or text example 3
Fn.If(first condition)
When first condition is valid! This is a required section
Fn.Else
Catch all! This is an optional section
Fn.EndIf
I am able to do this
(Fn\.If\s*)\((.+?)\)([\s\S]+)(Fn\.EndIf)
works with text example 2 (Fn\.ElseIf\s*)\((.+?)\)([\s\S]+)
will return the Fn.ElseIf(...)....
group (Fn\.Else)([\s\S]+)
will capture the Fn.Else.....
groups However, I am struggling to put all 3 patterns togather while idecating that line 2 can have zero or more groups, followed by one or none of line 3.
I tried the following which isn't working. To make it easier to read, I added a new line after each group for the sake of the question only.
(Fn\.If\s*)\((.+?)\)([\s\S]+)
((Fn\.ElseIf\s*)\((.+?)\)([\s\S]+))*
((Fn\.Else)([\s\S]+))?
(Fn\.EndIf)
I felt using a single monolithic Regex would make things too complicated - so here's a finite-state-machine based approach that still uses Regexes to capture each line.
void Main()
{
const String input =
@"Fn.If(first condition)
When the first condition is valid! This is a required section
Fn.ElseIf(some second condition)
When the second condition is valid! This is an optional section
Fn.ElseIf(third second condition)
When the third condition is valid! This is an optional section
Fn.Else
Catch all! This is an optional section
Fn.End
";
Regex rIf = new Regex( @"^Fn\.If\((.+)\)\s*$" );
Regex rElseIf = new Regex( @"^Fn\.ElseIf\((.+)\)\s*$" );
Regex rElse = new Regex( @"^Fn\.Else\s*$" );
Regex rEnd = new Regex( @"^Fn\.End\s*$" );
String[] lines = input.Split(new String[] { "\r\n" }, StringSplitOptions.None );
List<Statement> statements = new List<Statement>();
String type = null;
String condition = null;
StringBuilder sb = new StringBuilder();
State state = State.Outside;
foreach( String line in lines )
{
switch( state )
{
case State.Outside:
Match mIf = rIf.Match( line );
if( mIf.Success )
{
type = "Fn.If";
condition = mIf.Groups[1].Value;
state = State.InIf;
}
break;
case State.InIf:
case State.InElseIf:
Match mElseIf = rElseIf.Match( line );
if( mElseIf.Success )
{
statements.Add( new Statement( type, condition, sb.ToString() ) );
sb.Length = 0;
state = State.InElseIf;
type = "Fn.ElseIf";
condition = mElseIf.Groups[1].Value;
}
else
{
Match mElse = rElse.Match( line );
if( mElse.Success )
{
statements.Add(new Statement(type, condition, sb.ToString()));
sb.Length = 0;
state = State.InElse;
type = "Fn.Else";
condition = null;
}
else
{
sb.Append( line );
}
}
break;
case State.InElse:
Match mEnd = rEnd.Match(line);
if (mEnd.Success)
{
statements.Add(new Statement(type, condition, sb.ToString()));
sb.Length = 0;
state = State.Outside;
type = null;
condition = null;
}
else
{
sb.Append( line );
}
break;
}
}
statements.Dump();
}
class Statement
{
public Statement( String type, String condition, String contents )
{
this.Type = type;
this.Condition = condition;
this.Contents = contents;
}
public String Type { get; }
public String Condition { get; }
public String Contents { get; }
}
// Define other methods and classes here
enum State
{
Outside,
InIf,
InElseIf,
InElse
}
Running in Linqpad gives me this output:
It's doable with one regex
This is a python version of the regex, but it should be translatable to C#
key is to use same capture group for all matches
(Fn\.[A-Za-z]+[^\(\n]*)((\((.+?)\)(?<=\)))?([\s\S]*?)(?=Fn\.))?
tested with all 3 examples
online preview: https://regex101.com/r/VqNlMm/1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.