简体   繁体   中英

Capturing phrases excluding variable phrase

I have a regular expression to capture phrases, which are mutually exclusive groups of two words (each word in the string will be captured at most once). I'm trying to exclude a specific (variable) phrase from the captured groups. The regex /\\w+\\s+\\w+/ provided by @Casimir will partition the string, matching groups as desired, but we also need to exclude a group that could occur anywhere in the string, and could occur multiple times.

For the string

'next saturday, swing dancing at the kato ballroom! bring friends!'

and the phrase 'swing dancing' the regex should return each group returned below except 'wing dancing'.

Test cases:

"next saturday, swing dancing at the kato ballroom! bring friends!".
  scan(/((?!swing dancing)(?:\w+)\s(?!swing dancing)(?:\w+))/)
=> [["next saturday"], ["wing dancing"], ["at the"], ["kato ballroom"], ["bring friends"]]

link http://rubular.com/r/Eogo29Ociz

"next saturday, swing dancing at the kato ballroom! come dancing with friends!"
  .scan(/((?!dancing)(?:\w+)\s(?!dancing)(?:\w+))/)
=> [["next saturday"], ["ancing at"], ["the kato"], ["ancing with"]]

link http://rubular.com/r/1TpcveiuX0

That should return

[["next saturday"], ["at the"], ["kato ballroom"], ["with friends"]] 

The regular expression may not need to repeat the negative lookahead, so long as we match phrases on either side of the phrase to exclude.

I'd like the regex to be entirely case-insensitive for both the negative look-ahead and matching results. I tried the /i option, but I can also downcase the strings beforehand as done in the above code.

Why is the regex not working, and do you have suggestions for improving it?

Use a capture group to isolate the target and put the string you don't want before in an optional non-capturing group: /\\b(?:swing\\s+dancing\\W+)?(\\w+\\s+\\w+)/

> "next saturday, swing dancing at the kato ballroom! bring friends!".scan(/\b(?:swing\s+dancing\W+)?(\w+\s+\w+)/)
=> [["next saturday"], ["at the"], ["kato ballroom"], ["bring friends"]] 

demo rubular

or with the \\K feature: /\\b(?:swing\\s+dancing\\W+)?\\K\\w+\\s+\\w+/

> "next saturday, swing dancing at the kato ballroom! bring friends!".scan(/\b(?:swing\s+dancing\W+)?\K\w+\s+\w+/)
=> ["next saturday", "at the", "kato ballroom", "bring friends"] 

The two ways are similar. They don't try to avoid "swing dancing", on the contrary they try to find it first. Then the last task consists to exclude it from the result.

The first pattern uses a capture group (since the scan method only returns the capture groups if any) and the second pattern uses \\K to say "don't return anything before this point" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM