简体   繁体   English

RegEx用于在C#中提取2个字符串之间的行

[英]RegEx for extracting lines between 2 strings in C#

I have below data in log file and i want to extract the lines that are between 2 phrases of "Process Started" and "Process Completed" including begining of the line and end of the line. 我在日志文件中有以下数据,我想提取“ Process Started”和“ Process Completed”两个短语之间的行,包括行的开头和行的结尾。

2016-11-28 12:18:59.5286 | 14 | Info | Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed -ABC, Status: Failed***



2016-11-28 13:18:59.5286 | 14 | Info | Process Started -DEF
....
..
2016-11-28 13:18:59.5286 | 14 | Info | Process Completed -DEF Status: Passed***

Using below RegEx i'm able to extract the lines but beginning and end of the lines with given match are missing. 使用下面的RegEx,我可以提取行,但是缺少具有给定匹配项的行的开头和结尾。

Regex r = new Regex("^*?Process Started -"+process.Name+"(.*?)Process Completed: "+process.Name+".*?", RegexOptions.Singleline);

Above regex returning like this 正则表达式上方这样返回

Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed

But I need like this 但是我需要这样

2016-11-28 12:18:59.5286 | 14 | Info | Process Started -ABC *****
....
..
2016-11-28 12:18:59.5286 | 14 | Info | Process Completed -ABC, Status: Failed***

You're close, but the lazy quantifier at the end is the problem: it will match the least it has to, which is nothing in this case. 您接近了,但是最后的懒惰量词是问题所在:它将与它必须达到的最低要求匹配,在这种情况下,它什么都没有。

Here's a revision of your regex that works: 这是有效的正则表达式的修订:

Regex r = new Regex("[^\n]*?Process Started -"
        + process.Name + "(.*?)Process Completed -"
        + process.Name + "[^\n]*", RegexOptions.Singleline);

Changes I made: 我所做的更改:

  • You had a colon instead of dash after "Process Completed" 在“处理完成”之后,您使用了冒号而不是破折号
  • Most important: [^\\n]* at the beginning and end prevent matching newlines, but gets the rest of the line 最重要:开头和结尾处的[^\\n]*阻止匹配换行符,但获得其余的行

Extra Info: 额外信息:

I'm not sure how you plan on using this in the context of your code, but if you need to extract all such sections, rather than for one specific process name, you can grab them all at once with this variation: 我不确定您打算如何在代码的上下文中使用它,但是如果您需要提取所有这些部分,而不是提取一个特定的进程名,则可以使用此变体一次抓住它们:

Regex r = new Regex("[^\n]*?Process Started -(\w+)(.*?)Process Completed -\1[^\n]*", RegexOptions.Singleline);

The \\1 is a backreference to whatever process name was matched by (\\w+) . \\1是对(\\w+)匹配的任何进程名称的反向引用。 You will end up with a collection of matches, one for each process name. 您将最终获得一组匹配项,每个进程名称一个。

You'd need to use the Multiline option and then you could do something like this: 您需要使用“ Multiline选项,然后可以执行以下操作:

var reg = new Regex(@"^.*Process Started -ABC(.*)$(\n^.*$)*?\n(^.*Process Completed -ABC.*)$", 
                    RegexOptions.Multiline);

But it's kind of ugly. 但这有点丑陋。 As @blaze_125 suggested in the comments, you're best bet is to probably divide in into lines and iterate looking for the Started and Completed strings and then grabbing all the lines in-between 正如@ blaze_125在评论中建议的那样,最好的办法是将其分成几行,然后迭代查找StartedCompleted字符串,然后抓住它们之间的所有行

You could do something like: 您可以执行以下操作:

var lines = str.Split('\n');

var q = new Queue<string>();

foreach (var l in lines)
{
    q.Enqueue(l);
    if (l.Contains("Process Completed"))   // you could use a regex here if you want more
                                           // complex matching
    {
        string output;
        while (q.Count > 0)
        {
            // your queue here would contain exactly one entry
            output = q.Dequeue();
            Console.WriteLine(output);
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM