简体   繁体   中英

Regex match repeating pattern only after string

let a PropDefinition be a string of the form prop\\d+ (true|false)

I have a string like:

((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))

I'd like to extract the bottom PropDefinitions only after the text 'sat', so the matches should be:

prop0 false
prop1 false
prop2 true

I originally tried using /(prop\\d (?:true|false))/s ( see example here ) but that obviously matches all PropDefinitions and I couldn't make it match repeats only after the sat string

I used rubular as an example above because it was convenient, but I'm really looking for the most language agnostic solution. If it's vital info, I'll most likely be using the regex in a Java application.

str =<<-Q
((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))
Q

p str[/^sat(.*)/m, 1].scan(/prop\d+ (?:true|false)/)

# => ["prop0 false", "prop1 false", "prop2 true"]

When you have patterns that are very different in nature as in this case (string after sat and selecting the specific patterns), it is usually better to express them in multiple regexes rather than trying to do it with a single regex.

s = <<_
((prop5 true))

sat
((prop0 false)
 (prop1 false)
 (prop2 true))
_

s.split(/^sat\s+/, 2).last.scan(/prop\d+ (?:true|false)/)
# => ["prop0 false", "prop1 false", "prop2 true"]

Part of the confusion has to do with SingleLine vs MultiLine matching. The patterns below work for me and return all matches in a single execution and without requiring a preliminary operation to split the string.

This one requires SingleLine mode to be specified separately (as in .Net RegExOptions):

(?<=sat.*)(prop\d (?:true|false))

This one specifies SingleLine mode inline which works with many, but not all, RegEx engines:

(?s)(?<=sat.*)(?-s)(prop\d (?:true|false))

You don't need to turn SingleLine mode off via the (?-s) but I think it is clearer in its intent.

The following pattern also toggles SingleLine mode inline, but uses a Negative LookAhead instead of a Positive LookBehind as it seems (according to regular-expressions.info [be sure to select Ruby and Java from the drop-downs]) the Ruby engine doesn't support LookBehinds--Positive or Negative--depending on the version, and even then doesn't allow quantifiers (also noted by @revo in a comment below). This pattern should work in Java, .Net, most likely Ruby, and others:

(prop\d (?:true|false))(?s)(?!.*sat)(?-s)
\s+[(]+\K(prop\d (?:true|false)(?=[)]))

现场演示

If Ruby can support the \\G anchor this is one solution.
It looks nasty, but several things are going on.
1. It only allows a single nest (outer plus many inners)
2. It will not match invalid forms that don't comply with '(prop\\d true|false)'

Without condition 2, it would be alot easier which is an indicator that a two regex
solution would do the same. First to capture the outer form sat((..)..(..)..) second to globally capture the inner form (prop\\d true|false) .

Can be done in a single regex, though this is going to be hard to look at, but should work (test case below in Perl).

# (?:(?!\A|sat\s*\()\G|sat\s*\()[^()]*(?:\((?!prop\d[ ](?:true|false)\))[^()]*\)[^()]*)*\((prop\d[ ](?:true|false))\)(?=(?:[^()]*\([^()]*\))*[^()]*\))

 (?:
      (?! \A | sat \s* \( )
      \G                            # Start match from end of last match
   |                              # or,
      sat \s* \(                    # Start form 'sat ('
 )
 [^()]*                        # This check section consumes invalid inner '(..)' forms
 (?:                           # since we are looking specifically for '(prop\d true|false)'
      \( 
      (?!
           prop \d [ ] 
           (?: true | false )
           \)
      )
      [^()]* 
      \)
      [^()]* 
 )*                            # End section, do optionally many times
 \( 
 (                             # (1 start), match inner form '(prop\d true|false)'
      prop \d [ ] 
      (?: true | false )
 )                             # (1 end)
 \)
 (?=                           # Look ahead for end form  '(..)(..))'
      (?:
           [^()]* 
           \( [^()]* \)
      )*
      [^()]* 
      \) 
 )

Perl test case

$/ = undef;

$str = <DATA>;

while ($str =~ /(?:(?!\A|sat\s*\()\G|sat\s*\()[^()]*(?:\((?!prop\d[ ](?:true|false)\))[^()]*\)[^()]*)*\((prop\d[ ](?:true|false))\)(?=(?:[^()]*\([^()]*\))*[^()]*\))/g)
{
   print "'$1'\n";
}

__DATA__
((prop10 true))
sat
((prop3 false)
(asdg) 

(propa false)

 (prop1 false)
 (prop2 true)
)
((prop5 true))

Output >>

'prop3 false'
'prop1 false'
'prop2 true'
/(?<=sat).*?(prop\d (true|false))/m

Match group 1 is what you want. See example .

BUT , I would really recommend split the string first. It's much easier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM