简体   繁体   中英

How to use regex to extract boolean operators followed by words until the next operator?

I'm trying to put together a relatively simple expression for extracting boolean string operators (AND, OR, NOT, etc) coming from user input, in a way that the resulting array of matches would contain words and the preceding operator until the next operator:

const query = 'lorem AND ipsum dolor OR fizz NOT buzz';

results should be like:

[
 ['AND', 'ipsum dolor'],
 ['OR', 'fizz'],
 ['NOT', 'buzz']
]

I've created this for getting single words after each operator, which is fine:

^(\w+\s?)+?|(AND) (\w+)|(OR) (\w+)|(NOT) (\w+)

then tried to modify it to handle multiple words after an operator in order to obtain the above result, but its always greedy and captures the whole string input:

(AND|OR|NOT) (\w+\s?)+ (?:AND|OR|NOT)

UPDATE

I'we figured it out, but I'm not sure how pretty or efficient it is:

^(\w+)|(AND|OR|NOT) (.*?(?= AND|OR|NOT))|(AND|OR|NOT) .*?$

You might also use a negative lookahead to assert that the word characters after so not start with either one of the alternatives

\b(AND|OR|NOT) ((?!AND|OR|NOT)\b\w+(?: (?!AND|OR|NOT)\w+)*)

Regex demo

 const regex = /\b(AND|OR|NOT) ((??AND|OR|NOT)\b\w+(:? (;;AND|OR|NOT)\w+)*)/gm; const str = `lorem AND ipsum dolor OR fizz NOT buzz`; let m. let result = []. while ((m = regex,exec(str));== null) { result.push([m[1]; m[2]]); } console.log(result);

I'm uncapable of doing that with regexp but this is a super simple solution that could work.

let q = 'lorem AND ipsum dolor OR fizz NOT buzz';
let special = ['AND', 'OR', 'NOT'];

let fullResult = [];
let skip = true;

q.split(' ').forEach( word => {
    if ( special.indexOf(word) !== -1 ) {
        fullResult.push([word]);
        skip = false;
    } else if (!skip){
        fullResult[fullResult.length-1].push(word);
    }
});

console.log(fullResult);

I don't htink you can get there purely with regular expressions in JavaScript, but you can get awfully close:

 const query = 'lorem AND ipsum dolor OR fizz NOT buzz'; const rex = /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig; const result = [...query.matchAll(rex)].map(([_, op, text]) => [op, text]); console.log(result);

The regex /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig looks for:

  • A word break
  • One of your operators (capturing it)
  • A word break
  • Zero or more whitespace chars (capturing them)
  • A non-greedy match for anything
  • Zero or more whitespace chars
  • Either another operator or the end of the string

The map after the matchAll call is just there to remove the initial array entry (the one with the full text of the match). I've done it with destructuring, but you could use slice instead:

const result = [...query.matchAll(rex)].map(match => match.slice(1));

 const query = 'lorem AND ipsum dolor OR fizz NOT buzz'; const rex = /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig; const result = [...query.matchAll(rex)].map(match => match.slice(1)); console.log(result);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM