简体   繁体   English

正则表达式以特定模式开始新匹配

[英]Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.你好,我对正则表达式有点陌生,有一个小问题,也许很简单。

I have the given text:我有给定的文字:

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*) matches only till sleeping but reates 3 matches correctly.我当前的正则表达式(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)只匹配到睡眠,但产生 3 个匹配正确。 But i need the Additional test text also in the second group.但我也需要第二组中的Additional test文本。 i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.我尝试了类似(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*)但现在我只有一场大型比赛,因为第二组将一切都拿走直到最后。

How can i match everything until a new line with a date starts and create a new match from there on?我如何匹配所有内容,直到开始有日期的新行并从那里创建新匹配?

If you are sure there is only one additional line to be matched you can use如果您确定只有一条附加线要匹配,您可以使用

(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)

See the regex demo .请参阅正则表达式演示 Details:细节:

  • (?m) - a multiline modifier (?m) - 多行修饰符
  • ^ - start of a line ^ - 行首
  • (\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string (\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - 组 1:日期时间字符串
  • \s* - zero or more whitespaces \s* - 零个或多个空格
  • (.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible. (.*(?:\n.*)?) - 第 2 组:除换行符之外的任何零个或多个字符 尽可能多的字符,然后是可选行,换行符后跟除换行符之外的任何零个或多个字符尽可能多的字符。

If there can be any amount of lines, you may consider如果可以有任意数量的行,您可以考虑

(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)

See this regex demo .请参阅此正则表达式演示 Here,这里,

  • (?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace (?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - 匹配相同如上,只是\s被替换为仅匹配水平空白的[\p{Zs}\t]
  • [\p{Zs}\t]* - 0+ horizontal whitespace chars [\p{Zs}\t]* - 0+ 个水平空白字符
  • (?s) - now, . (?s) - 现在, . will match any chars including a newline将匹配任何字符,包括换行符
  • (.*?) - Group 2: any zero or more chars, as few as possible (.*?) - 第 2 组:任何零个或多个字符,尽可能少
  • (?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string. (?=\n\d{2}\.\d{2}\.\d{4}|\z) - 直到最左边出现的换行符,后跟日期字符串,或直到结尾细绳。

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.您正在使用\s重复使用带有字符 class [,.:\w\s]**量词,并且\s也匹配换行符并且匹配太多。

You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.您可以使用不匹配换行符的(.*\r?\n.*)匹配行的 rest ,然后匹配同一组中的换行符和下一行。

^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)

Regex demo 正则表达式演示

If multiple lines can follow, match all following lines that do not start with a date like pattern.如果可以跟随多行,则匹配以下所有不以类似日期的模式开头的行。

^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)

Explanation解释

  • ^ Start of the string ^字符串的开头
  • ( Capture group1 (捕获组 1
  • \d{2}\.\d{2}\.\d{4} Match a date like pattern \d{2}\.\d{2}\.\d{4}匹配类似日期的模式
  • ) Close group 1 )关闭第 1 组
  • \s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]* ) \s*匹配 0+ 个空白字符(或匹配没有换行符的空白字符[^\S\r\n]*
  • ( Capture group 2 (捕获组 2
    • .* Match the whole line .*匹配整行
    • (?:\r?\n(?.\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern (?:\r?\n(?.\d{2}\.\d{2}\.\d{4}).*)*如果整行不以日期开头,则可选择重复匹配整行图案
  • ) Close group 2 )关闭第 2 组

Regex demo 正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM