简体   繁体   English

正则表达式,用于回车线

[英]regex for carriage return lines

I am trying to write a regex for logs which seems to be working fine for log entries but in some log entries there are carriage returns which then fails to pick up the next line 我正在尝试为日志编写一个正则表达式,似乎对日志条目而言工作正常,但是在某些日志条目中有回车符,然后无法提取下一行

([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)

above regex works fine for lines with no extra carriage return 上面的正则表达式适用于没有额外回车符的行

01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal

but this fails to pick up extra line 1 and extra line 2 when on of the lines have added carriage return 但是当其中的几行增加了回车符时,这不会占用extra line 1extra line 2

01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
extra line 1
extra line 2
03 Jan 2018 08:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal

I even tried to add ^ to match start but that only picks the first log entry 我什至尝试添加^以匹配开始,但这仅选择了第一个日志条目

^([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)

You might use 你可能会用

(?<=\n|^)(\d{2} [A-Za-z]{3} \d{4} \d{2}:\d{2}:\d{2}(?:,\d{3})?)\s?(.*?)(?=$|\n\d{2} [A-Za-z]{3} \d{4})
^^^^^^^^^                                                            ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The important part is the lookahead at the end for a date or the end of the string. 重要的部分是日期或字符串结尾的前瞻。 Also make sure to lazy-repeat the . 还请确保延迟重复. . The beginning also has lookbehind for a \\n or ^ instead of the m flag so that the lookahead at the end for $ will only match the end of the string, not just the end of a line. 开头也有一个\\n^后缀,而不是m标志,因此$结尾的前瞻将仅匹配字符串的末尾,而不仅仅是行尾。

https://regex101.com/r/YAkWBe/1 https://regex101.com/r/YAkWBe/1

Also remember that you can simplify [0-9] to \\d . 还请记住,您可以将[0-9]简化为\\d

If you can't use the s flag (allows the dot to match a newline), then instead of repeating the dot to capture the (possibly multiline) string after the date, use [\\s\\S] , which will capture everything (all non-whitespace characters, and all whitespace characters -> everything): 如果您不能使用s标志(允许点匹配换行符),则可以使用[\\s\\S]代替重复点以捕获日期之后的(可能是多行)字符串,这将捕获所有内容(所有非空白字符, 以及所有空白字符->所有内容):

([\s\S]*?)

I can offer the following regex which works fine, except that it doesn't capture the very last log entry in your file: 我可以提供以下正常运行的正则表达式,除了它不能捕获文件中的最后一个日志条目外:

([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*?)(?=[0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3}))

The long story short is that I added a lookahead to the end of your pattern, after the (.*) , which pauses when it encounters the start of the next log entry. 长话短说,我在(.*)后面的模式末尾添加了一个前瞻,当遇到下一个日志条目的开始时,它会暂停。 Then, the only other change is to use (.*?) , ie make the dot lazy so that it will pause at the lookahead. 然后,唯一的其他更改是使用(.*?) ,即使点变得懒惰,以使其在前瞻时暂停。

Also, this regex should be run in dot all mode, where .* would match across lines. 另外,此正则表达式应以点所有模式运行,其中.*将跨行匹配。 If you don't have dot all mode explicitly available, you may be able to use [\\s\\S]* as an alternative. 如果没有显式可用的点所有模式,则可以使用[\\s\\S]*作为替代。

Demo 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM