简体   繁体   中英

Struggling with a regex for matching inner quote+parenthesis. Do I need negative/positive look-ahead/behind?

I'm trying to perform a regex on following strings:

  1. "sonoma wildfires"
  2. sonoma and (wild* or stratus or kincade)
  3. sonoma and (wild or "stratus kincade")

... so that I get the following matches:

  1. ['"sonoma wildfires"']
  2. ['sonoma', 'and', '(wild* or stratus or kincade)']
  3. ['sonoma', 'and', '(wild* or "stratus or kincade")']

I'm using the following regex:

/\w+\*?|["(][^()"]+[")]/g

The first two strings match correctly.

But with the third string, I get this match:

['sonoma', 'and', '(wild* or "', 'stratus', 'kincade']

... and what I want is:

['sonoma', 'and', '(wild* or "stratus or kincade")']

It's matching the first inner parenthesis but also grabbing the first inner quote. I've been tweaking the regex with negative and positive look-aheads but I having trouble figuring it out.

/\w+\*?|["(](?<?\()[^()"]+(?!\))[")]/g

if these 3 cases are the only stereotypes you look for you can try this

/(\w+) +(and) +(\(.+\))|(\".+\")/g

it will look for

  • word and ( expression )
  • " expression "

test it in regexr: https://regexr.com/5adgh

[edit]

sorry i had forgotten the capturing groups

The first pattern that you tried \w+\*?|["(][^()"]+[")] does not give the desired match because the second part of the alternation first matches any of the listed chars ["(] and it can match (

Then the next part [^()"]+ matches one or more occurrences of any char except the listed. The match will not reach the closing parenthesis because it can not cross the double quote inside the third example which is present in the negated character class.


You don't need any lookarounds, you can add a third alternative to the alternation.

\w+\*?|\([^()]+\)|"[^"]+"

Explanation

  • \w+\*? Match 1+ word chars and optional *
  • | Or
  • \([^()]+\) Match from opening till closing parenthesis using a negated character class
  • | Or
  • "[^"]+" Match from double quote to double quote using a negated character class

Regex demo

 [ `sonoma wildfires"`, `sonoma and (wild* or stratus or kincade)`, `sonoma and (wild or "stratus kincade")`, ].forEach(s => console.log(s.match(/\w+\*?|\([^()]+\)|"[^"]+"/g)));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM