简体   繁体   English

匹配正则表达式中的字符(包括换行符)直到下一个匹配

[英]Matching characters (inc. newlines) in a regex until next match is found

I am trying to parse a log file using regex, the problem is as soon as I turn on SingleLine mode so that I can include multi-line errors then future matches are included in the first match rather than their own. 我正在尝试使用正则表达式解析日志文件,问题是我打开SingleLine模式以便我可以包含多行错误,然后未来的匹配包含在第一个匹配中而不是它们自己的匹配中。

To explain better, here is an example of a log file: 为了更好地解释,这是一个日志文件的示例:

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

test 测试

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

INFO 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf INFO 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

test 2 测试2

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 – sdsdfsdfsdfsdfsdf 错误16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

I am using the following regex: 我使用以下正则表达式:

.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+

This matches each line correctly but excludes the part of the message which has run onto a new line. 这会正确匹配每一行,但不包括已运行到新行的消息部分。 But when I turn on singleline mode there is only one match (the first) and all the other entries are included in it. 但是当我打开单线模式时,只有一个匹配(第一个),所有其他条目都包含在其中。

Can anyone point me in the right direction? 谁能指出我正确的方向?

Thanks :) 谢谢 :)

Basically the idea behind this solution is to tell your regex not what to include but where to stop . 基本上这个解决方案背后的想法是告诉你的正则表达式不是要包括什么但是停止在哪里

This regex uses a positive lookahead to stop nongreedily at the next occurrence of your regex (or at the end of the whole string) 这个正则表达式使用正向前瞻来在你的正则表达式的下一次出现时(或在整个字符串的末尾)非常地停止

.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+?
     (?=(.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})|\z)

This includes also the INFO line as part of the previous error message. 这还包括INFO行作为上一个错误消息的一部分。 It sounds a little buggy, so, in case you want to consider the INFO line as a single error message (not part of the previous one) you might consider using this regexp instead 这听起来有点儿麻烦,所以,如果您想将INFO行视为单个错误消息(不是前一个消息的一部分),您可以考虑使用此正则表达式代替

.{4,5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+?
(?=.{4,5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})

From your example text file it looks like there may be some blank lines. 从您的示例文本文件看起来可能有一些空行。 If that's ok, you should be able to use this regex: 如果没关系,你应该可以使用这个正则表达式:

^(?:ERROR) \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - (?:(?!ERROR|INFO)(?:[a-z0-9A-Z ,:\-\t]*)\n)+

If it was just a mistake and blank lines are not wanted, replace last + with * : 如果只是一个错误并且不想要空白行,请将last +替换为*

^(?:ERROR) \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - (?:(?!ERROR|INFO)(?:[a-z0-9A-Z ,:\-\t]*)\n)*

This won't match the INFO line, but you wrote that you want only errors. 这与INFO行不匹配,但您写道您只想要错误。 If there are some other message formats (like WARNING perhaps), you must include them into this part: (?!ERROR|INFO) 如果还有其他一些消息格式(例如WARNING ),则必须将它们包含在此部分中:( (?!ERROR|INFO)

Since you have no matching groups in your regexp, I used (?:...) non-matching variant. 由于你的正则表达式中没有匹配的组,我使用了(?:...)非匹配变体。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM