RegEx模式以匹配单行或更多行

Question

I'm parsing a log file to identify and retrieve information about failures. 我正在解析一个日志文件，以识别和检索有关故障的信息。 Regular Expressions seem to be the right way to go about this. 正则表达式似乎是解决此问题的正确方法。

Here's my initial pattern: \\d{4}-\\d{2}-\\d{2} \\d{2}.* 这是我的初始模式： \\d{4}-\\d{2}-\\d{2} \\d{2}.*

This works for well for single lines like this: 这对于像这样的单行效果很好：

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0

This doesn't work for information that spans multiple lines. 这对于跨越多行的信息不起作用。

2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |StackLine:0:0

Here is what a couple of lines in the log look like: 这是日志中的几行内容：

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0 

 2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |7th StackLine:0:0  
6th StackLine:0:0  
5th StackLine:0:0  
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0

The phrase "StackLine" represents a method signature in the dumped call stack. 短语“ StackLine”表示转储的调用堆栈中的方法签名。 For example, here two different "StackLine" examples: 例如，这里有两个不同的“ StackLine”示例：

ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0

and 和

OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17

In an ideal world, I would just get the line, starting at the time stamp through that first line:character notation (which is frequently 0:0). 在理想的世界中，我会从第一行的第一个line：character表示法的时间戳开始（通常为0：0）。

How would I go about creating a pattern that would match both? 我将如何创建同时匹配两者的模式？

Answer 1

This will match a line starting with a date and all lines following it that do not start with a date. 这将匹配以日期开头的行以及该行之后不以日期开头的所有行。

^\d{4}-\d{2}-\d{2} \d{2}.*$(?:\n(?!\d{4}-\d{2}-\d{2}).*)*

Here is a Rubular example: http://www.rubular.com/r/1BIoLZ5tfs 这是一个Rubular示例： http ://www.rubular.com/r/1BIoLZ5tfs

edit 2 : If you want to stop at the first :0:0 you can use the following regex as long as you have a multi-line option enabled so that the . 编辑2 ：如果要在第一个:0:0处停止，则可以使用以下正则表达式，只要您启用了多行选项即可. character will also match newlines: 字符还将与换行符匹配：

^\d{4}-\d{2}-\d{2} \d{2}:.*?:\d+:\d+

And here is a new Rubular: http://www.rubular.com/r/rfR1wqDHR8 这是一个新的Rubular： http ://www.rubular.com/r/rfR1wqDHR8

Answer 2

var log = @"2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0 4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0";
var regex = @"\d{4}-\d{2}-\d{2}\s\d{2}.*?";
var matches = Regex.Matches(log, regex);
var count = matches.Count; // count = 4

Answer 3

Here is a regular expression that matches all your lines: 这是匹配您所有行的正则表达式：
\\d{4}-\\d{2}-\\d{2} \\d{2}[\\S\\s]*

The reasen your regex didn't work is, because the dot-modifier rarely functions as an "match everything" 您的正则表达式无效的原因是，因为点修饰符很少用作“匹配所有内容”

Answer 4

PCRE has modifiers and you need PCRE_DOTALL . PCRE具有修饰符，您需要PCRE_DOTALL 。 You didn't specify a language so I can't give you more than a PHP example: preg_match('/\\d{4}-\\d{2}-\\d{2} \\d{2}.*/s' 您没有指定语言，所以我只给您一个PHP示例： preg_match('/\\d{4}-\\d{2}-\\d{2} \\d{2}.*/s'

Answer 5

var rx = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}[\s\S]*?$^\s*$", 
                   RegexOptions.Multiline);

var matches = rx.Matches(yourText);

Be aware that with \\d you could catch non european digits, but considering that your file is quite "fixed" in format, you shouldn't have any problem ( \\d catches all of these: Unicode Characters in the 'Number, Decimal Digit' Category ) 请注意，使用\\d可以捕获非欧洲数字，但是考虑到文件格式上的“固定”，您应该不会有任何问题（ \\d捕获所有这些： 'Number，Decimal Digit中的Unicode字符'类别）

This will work only if there is a blank line at the end of each "log". 仅当每个“日志”末尾有空白行时，此方法才有效。 Even the last log must have a blank line, so the format must be 即使最后一个日志也必须有一个空行，所以格式必须是

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine
secondary line of the previous line
(blank)
2011-02-06 02:17:56.9886|FATAL|ClassName|Failure data|StackLine
(blank)

RegEx模式以匹配单行或更多行

问题描述

5 个解决方案

解决方案1
2 已采纳 2011-02-22 19:01:27

解决方案2
1 2011-02-22 19:22:24

解决方案3
0 2011-02-22 18:44:07

解决方案4
0 2011-02-22 18:48:23

解决方案5
0 2011-02-22 19:26:38

RegEx模式以匹配单行或更多行

问题描述

5 个解决方案

解决方案1 2 已采纳 2011-02-22 19:01:27

解决方案2 1 2011-02-22 19:22:24

解决方案3 0 2011-02-22 18:44:07

解决方案4 0 2011-02-22 18:48:23

解决方案5 0 2011-02-22 19:26:38

解决方案1
2 已采纳 2011-02-22 19:01:27

解决方案2
1 2011-02-22 19:22:24

解决方案3
0 2011-02-22 18:44:07

解决方案4
0 2011-02-22 18:48:23

解决方案5
0 2011-02-22 19:26:38