[英]Regex start new match at specific pattern
Hello im kinda new to regex and have a small, maybe simple question.你好,我对正则表达式有点陌生,有一个小问题,也许很简单。
I have the given text:我有给定的文字:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.我当前的正则表达式
(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
只匹配到睡眠,但产生 3 个匹配正确。 But i need the Additional test
text also in the second group.但我也需要第二组中的
Additional test
文本。 i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*)
but now i have only one huge match because the second group takes everything until the end.我尝试了类似
(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*)
但现在我只有一场大型比赛,因为第二组将一切都拿走直到最后。
How can i match everything until a new line with a date starts and create a new match from there on?我如何匹配所有内容,直到开始有日期的新行并从那里创建新匹配?
If you are sure there is only one additional line to be matched you can use如果您确定只有一条附加线要匹配,您可以使用
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo .请参阅正则表达式演示。 Details:
细节:
(?m)
- a multiline modifier (?m)
- 多行修饰符^
- start of a line ^
- 行首(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})
- Group 1: a datetime string (\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})
- 组 1:日期时间字符串\s*
- zero or more whitespaces \s*
- 零个或多个空格(.*(?:\n.*)?)
- Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible. (.*(?:\n.*)?)
- 第 2 组:除换行符之外的任何零个或多个字符 尽可能多的字符,然后是可选行,换行符后跟除换行符之外的任何零个或多个字符尽可能多的字符。 If there can be any amount of lines, you may consider如果可以有任意数量的行,您可以考虑
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo .请参阅此正则表达式演示。 Here,
这里,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})
- matches the same as above, just \s
is replaced with [\p{Zs}\t]
that only matches horizontal whitespace (?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})
- 匹配相同如上,只是\s
被替换为仅匹配水平空白的[\p{Zs}\t]
[\p{Zs}\t]*
- 0+ horizontal whitespace chars [\p{Zs}\t]*
- 0+ 个水平空白字符(?s)
- now, .
(?s)
- 现在, .
will match any chars including a newline(.*?)
- Group 2: any zero or more chars, as few as possible (.*?)
- 第 2 组:任何零个或多个字符,尽可能少(?=\n\d{2}\.\d{2}\.\d{4}|\z)
- up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string. (?=\n\d{2}\.\d{2}\.\d{4}|\z)
- 直到最左边出现的换行符,后跟日期字符串,或直到结尾细绳。You are using \s
repeatedly using the *
quantifier with the character class [,.:\w\s]*
and \s
also matches newlines and will match too much.您正在使用
\s
重复使用带有字符 class [,.:\w\s]*
的*
量词,并且\s
也匹配换行符并且匹配太多。
You can just match the rest of the line using (.*\r?\n.*)
which would not match a newline, then match a newline and the next line in the same group.您可以使用不匹配换行符的
(.*\r?\n.*)
匹配行的 rest ,然后匹配同一组中的换行符和下一行。
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
If multiple lines can follow, match all following lines that do not start with a date like pattern.如果可以跟随多行,则匹配以下所有不以类似日期的模式开头的行。
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation解释
^
Start of the string ^
字符串的开头(
Capture group1 (
捕获组 1\d{2}\.\d{2}\.\d{4}
Match a date like pattern \d{2}\.\d{2}\.\d{4}
匹配类似日期的模式)
Close group 1 )
关闭第 1 组\s*
Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*
) \s*
匹配 0+ 个空白字符(或匹配没有换行符的空白字符[^\S\r\n]*
)(
Capture group 2 (
捕获组 2
.*
Match the whole line .*
匹配整行(?:\r?\n(?.\d{2}\.\d{2}\.\d{4}).*)*
Optionally repeat matching the whole line if it does not start with a date like pattern (?:\r?\n(?.\d{2}\.\d{2}\.\d{4}).*)*
如果整行不以日期开头,则可选择重复匹配整行图案)
Close group 2 )
关闭第 2 组
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.