简体   繁体   中英

Find contents between square brackets and quotations

So to put it straight, lets say I have this string:

command [stuff] [stuff [inside] this] "string" "another [thing] string"

Inside of my code I want to grab all the things with quotation marks and put them in an array and grab all the things inside of outer most brackets (everything inside of the outside brackets) and put them in their own array. Like so:

const string = `command [stuff] [stuff [inside] this] "string" "another [thing] string"`;

let quotations = ["string", "another [thing] string"]
let brackets = ["stuff", "stuff [inside] this"] // I do not want to include any brackets found inside of quotation marks

I have tried to make a regex that would do this, but I am just having a lot of trouble understanding how I would set it up. I did find these two regex which find the stuff in quotations and brackets but they aren't 100% what I am looking for:

// JavaScript Regex
const regexStrings = /(["'])(?:(?=(\\?))\2.)*?\1/g;
const regexBrackets = /\[(.*?)\]/g;

Here's an attempt.

The regex for the quotes will try to find non-quotes between quotes.

The regex for the brackets will first try to match non-quotes between quotes, and then the stuff between brackets.
Then filters out the matches that start with a quote.

There's no recursion, so it's only 1 level of optional brackets within brackets.

 const string = `command [stuff] [stuff [inside] this] "string" "another [thing] string"`; // JavaScript Regex const regexStrings = /"[^"]*"|'[^']'/g; const regexBrackets = /"[^"]*"|'[^']'|(\\[[^\\[\\]]*(?:\\[[^\\[\\]]*\\])?[^\\[\\]]*\\])/g; let quotations = string.match(regexStrings) .map(x=>x.replace(/["']/g,'')); let brackets = string.match(regexBrackets).filter(x=>!/^["']/.test(x)); console.log(quotations); console.log(brackets);

There is no support for recursion in JavaScript's regex syntax, so you'll need to throw in some code in order to cope with an arbitrary depth of bracket nesting.

  • <\/li>
  • <\/li>
  • <\/li><\/ul>

    Then use a depth counter to keep track how deeply the brackets are so you know when to build a bracket substring by concatenating the tokens along the way.

Suppose the given string is as follows.

'command [stuff] [stuff [inside] this] "string" "another [thing] string"'
          bbbbb   bbbbbbbbbbbbbbbbbbb   dddddd   dddddddddddddddddddddd         

We wish to extract the values marked bbb... (within brackets) to one array and values marked ddd... (within double-quotes) to a second array. This can be done in two steps.

Step 1: extract all strings within double-quotes and replace those matches, including the surrounding double-quotes, with empty strings

Replace matches of the following regular expression (with the g flag set) with empty strings. 1

"([^"]*)"

That will return

'command [stuff] [stuff [inside] this]  '
          bbbbb   bbbbbbbbbbbbbbbbbbb

which we will use in the second step, as shown below.

As well, the contents of capture group 1 will be 'string' and 'another [thing] string' , which we must save.

Demo 1

This expression reads, "match a double-quote followed by zero or more characters other than a double-quote, followed by a double-quote, with the sting bounded by the double-quotes saved to capture group 1".

Step 2: extract all strings delimited with brackets that are not within a string that is delimited with brackets

We can obtain the strings of interest ( 'stuff' and 'stuff [inside] this' ) by matching the regular expression

(?<=\[)[^\[\]]*(?:\[[^\[\]]*\])?[^\[\]]*(?=\])

Demo 2

This expression can be broken down as follows.

(?<=\[)     # positive lookbehind asserts match is preceded by '['
[^\[\]]*    # match 0+ chars other than '[' and ']'
(?:         # begin non-capture group
  \[        # match '['
  [^\[\]]*  # match 0+ chars other than '[' and ']'
  \]        # match ']'
)?          # end non-capture group and make it optional
[^\[\]]*    # match 0+ chars other than '[' and ']'
(?=\])      # positive lookahead asserts match is followed by ']'

Note that this expression does not work with more than one level of nesting, such as

'[stuff [inside [stuff] like] this]'
  bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

The regular expression I've given could be modified to handle up to any given number of levels of nesting by extending the approach I have taken, but it becomes unwieldy for more than three levels of nesting.

1. Alternatively, we could write "(.*?)" . Making .* lazy ( ? ) prevents the match from gobbling up characters, including double-quotes, until the last double-quote in the string is reached. If we were to use ".*" (a greedy match) we would obtain the single match, '"string" "another [thing] string'" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM