正则表达式匹配模式直到下一次出现

Question

I have following data: 我有以下数据：

2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters) {
  'x':1,
  'y':2,
  'z':3,
  'w':4
}
2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters){
  'a':5,
  'b':6,
  'c':7,
  'd':8
}

I've to extract all DEBUG statements and for that I am using this regex (\\d{4}\\-\\d{2}\\-\\d{2}\\ \\d{2}\\:\\d{2}\\:\\d{2}\\ DEBUG(.|\\n|\\r)*?)(?=\\d{4}\\-\\d{2}\\-\\d{2}\\ \\d{2}\\:\\d{2}\\:\\d{2}) but it is omitting the last DEBUG statement. 我必须提取所有DEBUG语句，为此，我正在使用此正则表达式(\\d{4}\\-\\d{2}\\-\\d{2}\\ \\d{2}\\:\\d{2}\\:\\d{2}\\ DEBUG(.|\\n|\\r)*?)(?=\\d{4}\\-\\d{2}\\-\\d{2}\\ \\d{2}\\:\\d{2}\\:\\d{2})但它省略了最后一个DEBUG语句。 What should be the regex to obtain following output? 正则表达式应该如何获得以下输出？

2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters) {
  'x':1,
  'y':2,
  'z':3,
  'w':4
}
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters){
  'a':5,
  'b':6,
  'c':7,
  'd':8
}

Answer 1

I suggest: 我建议：

Anchor the matches at the start of the line to make it safer (by using (?m) ) 在行的开头锚定匹配项以使其更安全（使用(?m) ）
Fix the current issue by adding an alternative with the very end of the string \\Z (same as Ken suggests in the comments) 通过在字符串\\Z末尾添加替代项来解决当前问题（与Ken在评论中建议的相同）
Replace a very inefficient (.|\\r|\\n)*? 替换效率很低的(.|\\r|\\n)*? pattern with .*? .*?模式.*? and adding a DOTALL modifier (?s) 并添加一个DOTALL修饰符(?s)

The whole fix will look like 整个修复程序看起来像

(?sm)^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} DEBUG\s*(.*?)(?=[\r\n]+\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}|\Z)

See the regex demo . 参见regex演示。

Details 细节

(?sm) - DOTALL and MULTILINE options on (?sm) -上的DOTALL和MULTILINE选项
^ - start of a line ^ -一行的开始
\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} - a timestamp like pattern \\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} -类似于模式的时间戳
DEBUG - a literal substring DEBUG文字子字符串
\\s* - 0+ whitespaces \\s* -0+空格
(.*?) - Group 1: any 0+ chars, as few as possible, up to but excluding (.*?) -组1：任何0个以上的字符，尽可能少，最多但不包括
(?=[\\r\\n]+\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}|\\Z) - a positive lookahead that requires either (?=[\\r\\n]+\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}|\\Z) -a积极的前瞻要求
- [\\r\\n]+\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} - one or more CR or LF symbol(s) followed with a timestamp like pattern [\\r\\n]+\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} -一个或多个CR或LF符号（ s）后跟类似时间戳的模式
- | - or - 要么
- \\Z - the very end of the string \\Z字符串的结尾

Answer 2

If you are sure that all the paragraphs with DEBUG will end with } , you can use: 如果您确定所有带DEBUG的段落都以}结尾，则可以使用：

r"(.*DEBUG[\s\S]*?\})"

If DEBUG may or may not have {} , the following regex should do the trick: 如果DEBUG可能有{}或没有{} ，则以下正则表达式可以解决问题：

r"(.*DEBUG.*(?!=\{|\n))(\{[\s\S]*?\})?"

正则表达式匹配模式直到下一次出现

问题描述

2 个解决方案

解决方案1
2 2018-03-26 09:54:13

解决方案2
1 2018-03-26 09:54:35

正则表达式匹配模式直到下一次出现

问题描述

2 个解决方案

解决方案1 2 2018-03-26 09:54:13

解决方案2 1 2018-03-26 09:54:35

解决方案1
2 2018-03-26 09:54:13

解决方案2
1 2018-03-26 09:54:35