简体   繁体   English

返回两行之间以[+]开头的所有行的子字符串

[英]Return substring of all lines that start with [+] between two specific lines

I have a sample multi-line string that looks like this: 我有一个示例多行字符串,如下所示:

[+] x: somerandomstuff
[!] blah
[+] x: somemorerandomstuff
[-] blah
[+] START
[+] x: 1st group to match
[!] blah
[-] blah
[+] x: 2nd group to match
[+] END

I want to match the strings after the x: in lines that look like [+] x: (...) , but only those that are between [+] START and [+] END . 我想在看起来像[+] x: (...)行中匹配x:之后的字符串,但匹配[+] START[+] END之间的那些字符串。 The expected result would be two groups (there could be more): 预期结果将是两个组(可能会更多):

1st group to match
2nd group to match

Note that there will only be one instance of START/END. 请注意,只有一个START / END实例。

I've only managed to come up with something that matches the first group: 我只设法提出了与第一组相匹配的东西:

\[\+\] START.*?\[\+\] x: (.*?)\n.*\[\+\] END

I currently lack the knowledge to extend this regex to match the other lines. 我目前缺乏扩展此正则表达式以匹配其他行的知识。 I'm not sure how to look for multiple lines that match a pattern, between another pattern ( [+] START and [+] END ) 我不确定如何在另一个模式( [+] START[+] END )之间寻找与某个模式匹配的多行

REGEX101 Link: https://regex101.com/r/kCgwhr/2 REGEX101链接: https ://regex101.com/r/kCgwhr/2

note: I know that a regex-only solution may not be the best thing here, but I would like to solve this with only regex. 注意:我知道仅使用正则表达式的解决方案可能不是这里最好的方法,但是我想仅使用正则表达式解决此问题。

I assume that you use a PCRE compatible regex, as you are using regex101 in PCRE mode. 我假设您使用的是PCRE兼容的正则表达式,就像您在PCRE模式下使用regex101一样。

You can make use of the \\G continuous matching (and some lookahead stuff) to match what you want: 您可以使用\\ G连续匹配(和一些先行的东西)来匹配您想要的内容:

(?:\[\+\] START|\G(?!\A))\R(?:(?!\[\+\] x:)(?!\[\+\] END).*\R)*\[\+\] x:\s*\K.*

This matches: 这符合:

  • (?:\\[\\+\\] START|\\G(?!\\A)) - the start sequence or right after the previous match. (?:\\[\\+\\] START|\\G(?!\\A)) -开始顺序或在上一场比赛之后。 \\G matches at the start of the string the first time the regex is called, so (?!\\A) ensures that \\G is only used after the first match is found. \\G在第一次调用正则表达式时在字符串开头匹配,因此(?!\\A)确保仅在找到第一个匹配项后才使用\\G
  • \\R - any newline sequence \\R任何换行序列
  • (?:(?!\\[\\+\\] x:)(?!\\[\\+\\] END).*\\R)* - any amount of lines that neither start with the end sequence or the sequence we want to match (basically to skip over them) (?:(?!\\[\\+\\] x:)(?!\\[\\+\\] END).*\\R)* -既不以结束序列也不以我们想要的序列开头的任意数量的行匹配(基本上跳过它们)
  • \\[\\+\\] x:\\s* - starts the sequence we want to match \\[\\+\\] x:\\s* -开始我们要匹配的序列
  • \\K - omits everything matched before (so we only match what we really want) \\K忽略之前匹配的所有内容(因此,我们仅匹配我们真正想要的内容)
  • .* the content of our wanted line .*我们想要的行的内容

See it working in regex 101 . 看到它在正则表达式101中工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM