简体   繁体   中英

RegEx for extracting lines between 2 strings in C#

I have below data in log file and i want to extract the lines that are between 2 phrases of "Process Started" and "Process Completed" including begining of the line and end of the line.

2016-11-28 12:18:59.5286 | 14 | Info | Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed -ABC, Status: Failed***



2016-11-28 13:18:59.5286 | 14 | Info | Process Started -DEF
....
..
2016-11-28 13:18:59.5286 | 14 | Info | Process Completed -DEF Status: Passed***

Using below RegEx i'm able to extract the lines but beginning and end of the lines with given match are missing.

Regex r = new Regex("^*?Process Started -"+process.Name+"(.*?)Process Completed: "+process.Name+".*?", RegexOptions.Singleline);

Above regex returning like this

Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed

But I need like this

2016-11-28 12:18:59.5286 | 14 | Info | Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed -ABC, Status: Failed***

You're close, but the lazy quantifier at the end is the problem: it will match the least it has to, which is nothing in this case.

Here's a revision of your regex that works:

Regex r = new Regex("[^\n]*?Process Started -"
        + process.Name + "(.*?)Process Completed -"
        + process.Name + "[^\n]*", RegexOptions.Singleline);

Changes I made:

  • You had a colon instead of dash after "Process Completed"
  • Most important: [^\\n]* at the beginning and end prevent matching newlines, but gets the rest of the line

Extra Info:

I'm not sure how you plan on using this in the context of your code, but if you need to extract all such sections, rather than for one specific process name, you can grab them all at once with this variation:

Regex r = new Regex("[^\n]*?Process Started -(\w+)(.*?)Process Completed -\1[^\n]*", RegexOptions.Singleline);

The \\1 is a backreference to whatever process name was matched by (\\w+) . You will end up with a collection of matches, one for each process name.

You'd need to use the Multiline option and then you could do something like this:

var reg = new Regex(@"^.*Process Started -ABC(.*)$(\n^.*$)*?\n(^.*Process Completed -ABC.*)$", 
                    RegexOptions.Multiline);

But it's kind of ugly. As @blaze_125 suggested in the comments, you're best bet is to probably divide in into lines and iterate looking for the Started and Completed strings and then grabbing all the lines in-between

You could do something like:

var lines = str.Split('\n');

var q = new Queue<string>();

foreach (var l in lines)
{
    q.Enqueue(l);
    if (l.Contains("Process Completed"))   // you could use a regex here if you want more
                                           // complex matching
    {
        string output;
        while (q.Count > 0)
        {
            // your queue here would contain exactly one entry
            output = q.Dequeue();
            Console.WriteLine(output);
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM