regex for carriage return lines

Question

I am trying to write a regex for logs which seems to be working fine for log entries but in some log entries there are carriage returns which then fails to pick up the next line

([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)

above regex works fine for lines with no extra carriage return

01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal

but this fails to pick up extra line 1 and extra line 2 when on of the lines have added carriage return

01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
extra line 1
extra line 2
03 Jan 2018 08:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal

I even tried to add ^ to match start but that only picks the first log entry

^([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)

Answer 1

You might use

(?<=\n|^)(\d{2} [A-Za-z]{3} \d{4} \d{2}:\d{2}:\d{2}(?:,\d{3})?)\s?(.*?)(?=$|\n\d{2} [A-Za-z]{3} \d{4})
^^^^^^^^^                                                            ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The important part is the lookahead at the end for a date or the end of the string. Also make sure to lazy-repeat the . . The beginning also has lookbehind for a \\n or ^ instead of the m flag so that the lookahead at the end for $ will only match the end of the string, not just the end of a line.

https://regex101.com/r/YAkWBe/1

Also remember that you can simplify [0-9] to \\d .

If you can't use the s flag (allows the dot to match a newline), then instead of repeating the dot to capture the (possibly multiline) string after the date, use [\\s\\S] , which will capture everything (all non-whitespace characters, and all whitespace characters -> everything):

([\s\S]*?)

Answer 2

I can offer the following regex which works fine, except that it doesn't capture the very last log entry in your file:

([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*?)(?=[0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3}))

The long story short is that I added a lookahead to the end of your pattern, after the (.*) , which pauses when it encounters the start of the next log entry. Then, the only other change is to use (.*?) , ie make the dot lazy so that it will pause at the lookahead.

Also, this regex should be run in dot all mode, where .* would match across lines. If you don't have dot all mode explicitly available, you may be able to use [\\s\\S]* as an alternative.

regex for carriage return lines

Question

2 answers

solution1
1 2018-08-08 05:47:38

solution2
0 2018-08-08 05:44:36

Demo

regex for carriage return lines

Question

2 answers

solution1 1 2018-08-08 05:47:38

solution2 0 2018-08-08 05:44:36

Demo

solution1
1 2018-08-08 05:47:38

solution2
0 2018-08-08 05:44:36