简体   繁体   中英

Matching characters (inc. newlines) in a regex until next match is found

I am trying to parse a log file using regex, the problem is as soon as I turn on SingleLine mode so that I can include multi-line errors then future matches are included in the first match rather than their own.

To explain better, here is an example of a log file:

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

test

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

INFO 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

test 2

ERROR 16-08 11:09:59,015 - sdsdfsdfsdfsdfsdf

ERROR 16-08 11:09:59,015 – sdsdfsdfsdfsdfsdf

I am using the following regex:

.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+

This matches each line correctly but excludes the part of the message which has run onto a new line. But when I turn on singleline mode there is only one match (the first) and all the other entries are included in it.

Can anyone point me in the right direction?

Thanks :)

Basically the idea behind this solution is to tell your regex not what to include but where to stop .

This regex uses a positive lookahead to stop nongreedily at the next occurrence of your regex (or at the end of the whole string)

.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+?
     (?=(.{5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})|\z)

This includes also the INFO line as part of the previous error message. It sounds a little buggy, so, in case you want to consider the INFO line as a single error message (not part of the previous one) you might consider using this regexp instead

.{4,5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - .+?
(?=.{4,5} \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})

From your example text file it looks like there may be some blank lines. If that's ok, you should be able to use this regex:

^(?:ERROR) \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - (?:(?!ERROR|INFO)(?:[a-z0-9A-Z ,:\-\t]*)\n)+

If it was just a mistake and blank lines are not wanted, replace last + with * :

^(?:ERROR) \d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} - (?:(?!ERROR|INFO)(?:[a-z0-9A-Z ,:\-\t]*)\n)*

This won't match the INFO line, but you wrote that you want only errors. If there are some other message formats (like WARNING perhaps), you must include them into this part: (?!ERROR|INFO)

Since you have no matching groups in your regexp, I used (?:...) non-matching variant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM