简体   繁体   English

正则表达式仅在字符串之后匹配重复模式

[英]Regex match repeating pattern only after string

let a PropDefinition be a string of the form prop\\d+ (true|false) PropDefinitionprop\\d+ (true|false)形式的字符串

I have a string like: 我有一个像这样的字符串:

((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))

I'd like to extract the bottom PropDefinitions only after the text 'sat', so the matches should be: 我只想在文本“ sat”之后提取底部的PropDefinitions ,因此匹配项应为:

prop0 false
prop1 false
prop2 true

I originally tried using /(prop\\d (?:true|false))/s ( see example here ) but that obviously matches all PropDefinitions and I couldn't make it match repeats only after the sat string 我最初尝试使用/(prop\\d (?:true|false))/s请参见此处的示例 ),但显然与所有PropDefinitions匹配,并且我无法使其匹配仅在sat字符串之后重复

I used rubular as an example above because it was convenient, but I'm really looking for the most language agnostic solution. 我在上面使用rubular作为示例,因为它很方便,但是我确实在寻找最不依赖语言的解决方案。 If it's vital info, I'll most likely be using the regex in a Java application. 如果这是至关重要的信息,我很可能会在Java应用程序中使用正则表达式。

str =<<-Q
((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))
Q

p str[/^sat(.*)/m, 1].scan(/prop\d+ (?:true|false)/)

# => ["prop0 false", "prop1 false", "prop2 true"]

When you have patterns that are very different in nature as in this case (string after sat and selecting the specific patterns), it is usually better to express them in multiple regexes rather than trying to do it with a single regex. 在这种情况下,如果您具有本质上非常不同的模式( sat字符串后选择特定模式),通常最好使用多个正则表达式来表达它们,而不是尝试使用单个正则表达式来表达。

s = <<_
((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))
_

s.split(/^sat\s+/, 2).last.scan(/prop\d+ (?:true|false)/)
# => ["prop0 false", "prop1 false", "prop2 true"]

Part of the confusion has to do with SingleLine vs MultiLine matching. 造成混淆的部分原因是SingleLine与MultiLine匹配。 The patterns below work for me and return all matches in a single execution and without requiring a preliminary operation to split the string. 下面的模式对我有用,并在一次执行中返回所有匹配项,而无需进行初步操作即可拆分字符串。

This one requires SingleLine mode to be specified separately (as in .Net RegExOptions): 这需要单独指定SingleLine模式(如.Net RegExOptions中一样):

(?<=sat.*)(prop\d (?:true|false))

This one specifies SingleLine mode inline which works with many, but not all, RegEx engines: 此行指定SingleLine模式内联,它可与许多(但不是全部)RegEx引擎一起使用:

(?s)(?<=sat.*)(?-s)(prop\d (?:true|false))

You don't need to turn SingleLine mode off via the (?-s) but I think it is clearer in its intent. 不需要把单线模式关闭通过(?-s)但我认为这是它的意图更加清晰。

The following pattern also toggles SingleLine mode inline, but uses a Negative LookAhead instead of a Positive LookBehind as it seems (according to regular-expressions.info [be sure to select Ruby and Java from the drop-downs]) the Ruby engine doesn't support LookBehinds--Positive or Negative--depending on the version, and even then doesn't allow quantifiers (also noted by @revo in a comment below). 下面的模式也切换单线模式内嵌,但使用的是负向前查找,而不是正回顾后发的,因为它似乎(根据regular-expressions.info [请务必从下拉菜单中选择的Ruby和Java])Ruby的发动机没有按”不支持LookBehinds(正向或负向),具体取决于版本,即使如此,它也不允许使用量词(也由@revo在下面的注释中指出)。 This pattern should work in Java, .Net, most likely Ruby, and others: 这种模式应该在Java,.Net,最有可能的Ruby和其他模式下工作:

(prop\d (?:true|false))(?s)(?!.*sat)(?-s)
\s+[(]+\K(prop\d (?:true|false)(?=[)]))

现场演示

If Ruby can support the \\G anchor this is one solution. 如果Ruby可以支持\\G锚,则这是一种解决方案。
It looks nasty, but several things are going on. 看起来很讨厌,但有几件事正在发生。
1. It only allows a single nest (outer plus many inners) 1.它只允许一个嵌套(外部加上许多内部)
2. It will not match invalid forms that don't comply with '(prop\\d true|false)' 2.它将不匹配不符合'(prop\\d true|false)'无效表格

Without condition 2, it would be alot easier which is an indicator that a two regex 没有条件2,这会容易得多,这表明两个正则表达式
solution would do the same. 解决方案将执行相同的操作。 First to capture the outer form sat((..)..(..)..) second to globally capture the inner form (prop\\d true|false) . 首先捕获外部形式sat((..)..(..)..)其次全局捕获内部形式(prop\\d true|false)

Can be done in a single regex, though this is going to be hard to look at, but should work (test case below in Perl). 可以在单个正则表达式中完成,尽管这很难看,但是应该可以工作(Perl下面的测试用例)。

# (?:(?!\A|sat\s*\()\G|sat\s*\()[^()]*(?:\((?!prop\d[ ](?:true|false)\))[^()]*\)[^()]*)*\((prop\d[ ](?:true|false))\)(?=(?:[^()]*\([^()]*\))*[^()]*\))

 (?:
      (?! \A | sat \s* \( )
      \G                            # Start match from end of last match
   |                              # or,
      sat \s* \(                    # Start form 'sat ('
 )
 [^()]*                        # This check section consumes invalid inner '(..)' forms
 (?:                           # since we are looking specifically for '(prop\d true|false)'
      \( 
      (?!
           prop \d [ ] 
           (?: true | false )
           \)
      )
      [^()]* 
      \)
      [^()]* 
 )*                            # End section, do optionally many times
 \( 
 (                             # (1 start), match inner form '(prop\d true|false)'
      prop \d [ ] 
      (?: true | false )
 )                             # (1 end)
 \)
 (?=                           # Look ahead for end form  '(..)(..))'
      (?:
           [^()]* 
           \( [^()]* \)
      )*
      [^()]* 
      \) 
 )

Perl test case Perl测试用例

$/ = undef;

$str = <DATA>;

while ($str =~ /(?:(?!\A|sat\s*\()\G|sat\s*\()[^()]*(?:\((?!prop\d[ ](?:true|false)\))[^()]*\)[^()]*)*\((prop\d[ ](?:true|false))\)(?=(?:[^()]*\([^()]*\))*[^()]*\))/g)
{
   print "'$1'\n";
}

__DATA__
((prop10 true))
sat
((prop3 false)
(asdg) 

(propa false)

 (prop1 false)
 (prop2 true)
)
((prop5 true))

Output >> 输出>>

'prop3 false'
'prop1 false'
'prop2 true'
/(?<=sat).*?(prop\d (true|false))/m

Match group 1 is what you want. 比赛组1是您想要的。 See example . 参见示例

BUT , I would really recommend split the string first. 但是 ,我真的建议您先分割字符串。 It's much easier. 这要容易得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM