简体   繁体   English

RegEx模式以匹配单行或更多行

[英]RegEx pattern to match over a single line or more

I'm parsing a log file to identify and retrieve information about failures. 我正在解析一个日志文件,以识别和检索有关故障的信息。 Regular Expressions seem to be the right way to go about this. 正则表达式似乎是解决此问题的正确方法。

Here's my initial pattern: \\d{4}-\\d{2}-\\d{2} \\d{2}.* 这是我的初始模式: \\d{4}-\\d{2}-\\d{2} \\d{2}.*

This works for well for single lines like this: 这对于像这样的单行效果很好:

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0

This doesn't work for information that spans multiple lines. 这对于跨越多行的信息不起作用。

2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |StackLine:0:0

Here is what a couple of lines in the log look like: 这是日志中的几行内容:

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0 

 2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |7th StackLine:0:0  
6th StackLine:0:0  
5th StackLine:0:0  
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0

The phrase "StackLine" represents a method signature in the dumped call stack. 短语“ StackLine”表示转储的调用堆栈中的方法签名。 For example, here two different "StackLine" examples: 例如,这里有两个不同的“ StackLine”示例:

ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0  

and

OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17

In an ideal world, I would just get the line, starting at the time stamp through that first line:character notation (which is frequently 0:0). 在理想的世界中,我会从第一行的第一个line:character表示法的时间戳开始(通常为0:0)。

How would I go about creating a pattern that would match both? 我将如何创建同时匹配两者的模式?

This will match a line starting with a date and all lines following it that do not start with a date. 这将匹配以日期开头的行以及该行之后不以日期开头的所有行。

^\d{4}-\d{2}-\d{2} \d{2}.*$(?:\n(?!\d{4}-\d{2}-\d{2}).*)*

Here is a Rubular example: http://www.rubular.com/r/1BIoLZ5tfs 这是一个Rubular示例: http ://www.rubular.com/r/1BIoLZ5tfs

edit 2 : If you want to stop at the first :0:0 you can use the following regex as long as you have a multi-line option enabled so that the . 编辑2 :如果要在第一个:0:0处停止,则可以使用以下正则表达式,只要您启用了多行选项即可. character will also match newlines: 字符还将与换行符匹配:

^\d{4}-\d{2}-\d{2} \d{2}:.*?:\d+:\d+

And here is a new Rubular: http://www.rubular.com/r/rfR1wqDHR8 这是一个新的Rubular: http ://www.rubular.com/r/rfR1wqDHR8

var log = @"2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0 4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0";
var regex = @"\d{4}-\d{2}-\d{2}\s\d{2}.*?";
var matches = Regex.Matches(log, regex);
var count = matches.Count; // count = 4

Here is a regular expression that matches all your lines: 这是匹配您所有行的正则表达式:
\\d{4}-\\d{2}-\\d{2} \\d{2}[\\S\\s]*

The reasen your regex didn't work is, because the dot-modifier rarely functions as an "match everything" 您的正则表达式无效的原因是,因为点修饰符很少用作“匹配所有内容”

PCRE has modifiers and you need PCRE_DOTALL . PCRE具有修饰符 ,您需要PCRE_DOTALL You didn't specify a language so I can't give you more than a PHP example: preg_match('/\\d{4}-\\d{2}-\\d{2} \\d{2}.*/s' 您没有指定语言,所以我只给您一个PHP示例: preg_match('/\\d{4}-\\d{2}-\\d{2} \\d{2}.*/s'

var rx = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}[\s\S]*?$^\s*$", 
                   RegexOptions.Multiline);

var matches = rx.Matches(yourText);

Be aware that with \\d you could catch non european digits, but considering that your file is quite "fixed" in format, you shouldn't have any problem ( \\d catches all of these: Unicode Characters in the 'Number, Decimal Digit' Category ) 请注意,使用\\d可以捕获非欧洲数字,但是考虑到文件格式上的“固定”,您应该不会有任何问题( \\d捕获所有这些: 'Number,Decimal Digit中的Unicode字符'类别

This will work only if there is a blank line at the end of each "log". 仅当每个“日志”末尾有空白行时,此方法才有效。 Even the last log must have a blank line, so the format must be 即使最后一个日志也必须有一个空行,所以格式必须是

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine
secondary line of the previous line
(blank)
2011-02-06 02:17:56.9886|FATAL|ClassName|Failure data|StackLine
(blank)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM