简体   繁体   中英

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.

I have the given text:

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

17.11.2020 15:32 typical Pat. seems sleeping
Additional test

My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*) matches only till sleeping but reates 3 matches correctly. But i need the Additional test text also in the second group. i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.

How can i match everything until a new line with a date starts and create a new match from there on?

If you are sure there is only one additional line to be matched you can use

(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)

See the regex demo . Details:

  • (?m) - a multiline modifier
  • ^ - start of a line
  • (\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
  • \s* - zero or more whitespaces
  • (.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.

If there can be any amount of lines, you may consider

(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)

See this regex demo . Here,

  • (?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
  • [\p{Zs}\t]* - 0+ horizontal whitespace chars
  • (?s) - now, . will match any chars including a newline
  • (.*?) - Group 2: any zero or more chars, as few as possible
  • (?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.

You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.

^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)

Regex demo

If multiple lines can follow, match all following lines that do not start with a date like pattern.

^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)

Explanation

  • ^ Start of the string
  • ( Capture group1
  • \d{2}\.\d{2}\.\d{4} Match a date like pattern
  • ) Close group 1
  • \s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]* )
  • ( Capture group 2
    • .* Match the whole line
    • (?:\r?\n(?.\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
  • ) Close group 2

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM