How to Non-Greedily match a pattern in a multi-line input?

Question

I'm trying to create a single regex for printing lines between two patterns. I need it to be non-greedy, though.

Below is the dummy C# input file.

/// <summary>
/// This is a comment
/// </summary>
[Fact]
[Trait("a", "b")]
[InlineData(cd)]
public async Task TestOne()
{
\\ lines of code
}

[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestTwo(string hello)
{
\\ lines of code
}

/// <summary>
/// This is a comment
/// </summary>
[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
public async Task TestThree(bye)
{
\\ lines of code
}

/// <summary>
/// This is a comment
/// </summary>
[Fact]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestFour(string hello)
{
\\ lines of code
}

[Theory]
public async Task TestFive()
{
\\ lines of code
}

What I want to be printed are the lines between "TestTwo()" and its nearest preceding "[Fact]" or "[Theory]". And ONLY those lines. That is, I want the following printed:

[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestTwo(string hello)

I also do not want to match for a specific number of lines, as the line count will fluctuate. Test() may or may not have parameters.

I'm implementing this using bash script, so any single liner bash method would also help. A simple regex would also be more than sufficient.

I've tried alot of things and achieving this is kind of a compromise for what I actually wanted to achieve using a one liner regex.

Do give a look at the following question as well if you can!

How to capture all matching groups between 2 particular patterns?

Thanks in advance!

Answer 1

If blank line is not always always present between each text block then following gnu-awk solution should work:

awk -v RS='\\[(Fact|Theory)]\n' '
match($0, /(.* TestTwo *\([^\n]+).+/, a) {
   print hdr a[1]
}
{hdr = RT}
' file.cpp

[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestTwo(string hello)

Here:

-v RS='\\[(Fact|Theory)]\n' sets record separator as [Fact] or [Theory] and line break
RT contains the text matched by RS regex
match function uses a capture group to match what we need to keep

Here is a gnu-grep solution to do the same:

grep -oPz '(?sm)(?>^//[^\n]+\n)*\[(?>Fact|Theory)]\n(?>(?!\[(?>Fact|Theory)]\n).)*TestTwo[^\n]+\n' file.cpp

RegEx Demo

Answer 2

One awk idea:

awk '
output              { output=output ORS $0 }     # if we have something in variable "output" then add current line
/\[Fact]|\[Theory]/ { output=$0 }                # (re)set "output" to current line
/ TestTwo\(/        { if (output)                # if "output" is not empty then ...
                         print output            # print to stdout and ...
                      output=""                  # clear
                    }
' input

This generates:

[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestTwo(string hello)

Answer 3

$ tac input|awk '/TestTwo/ {f=1} /\[Theory\]|\[Fact\]/ && f {f=0; print} f{print}'|tac 
[Theory]
[Trait("ab", "cd")]
[Trait("ef", "ghi")]
[InlineData(jkl)]
[InlineData(mnop)]
public async Task TestTwo(string hello)

How to Non-Greedily match a pattern in a multi-line input?

Question

3 answers

solution1
2 ACCPTED 2022-08-17 19:57:36

solution2
1 2022-08-17 20:28:32

solution3
0 2022-08-18 08:08:40

How to Non-Greedily match a pattern in a multi-line input?

Question

3 answers

solution1 2 ACCPTED 2022-08-17 19:57:36

solution2 1 2022-08-17 20:28:32

solution3 0 2022-08-18 08:08:40

solution1
2 ACCPTED 2022-08-17 19:57:36

solution2
1 2022-08-17 20:28:32

solution3
0 2022-08-18 08:08:40