简体   繁体   English

如何使用正则表达式提取 boolean 运算符后跟单词直到下一个运算符?

[英]How to use regex to extract boolean operators followed by words until the next operator?

I'm trying to put together a relatively simple expression for extracting boolean string operators (AND, OR, NOT, etc) coming from user input, in a way that the resulting array of matches would contain words and the preceding operator until the next operator:我正在尝试组合一个相对简单的表达式,用于从用户输入中提取 boolean 字符串运算符(AND、OR、NOT 等),这样得到的匹配数组将包含单词和前面的运算符,直到下一个运算符:

const query = 'lorem AND ipsum dolor OR fizz NOT buzz';

results should be like:结果应该是这样的:

[
 ['AND', 'ipsum dolor'],
 ['OR', 'fizz'],
 ['NOT', 'buzz']
]

I've created this for getting single words after each operator, which is fine:我创建这个是为了在每个运算符之后获取单个单词,这很好:

^(\w+\s?)+?|(AND) (\w+)|(OR) (\w+)|(NOT) (\w+)

then tried to modify it to handle multiple words after an operator in order to obtain the above result, but its always greedy and captures the whole string input:然后尝试修改它以处理运算符后的多个单词以获得上述结果,但它总是贪婪并捕获整个字符串输入:

(AND|OR|NOT) (\w+\s?)+ (?:AND|OR|NOT)

UPDATE更新

I'we figured it out, but I'm not sure how pretty or efficient it is:我已经弄明白了,但我不确定它有多漂亮或有效率:

^(\w+)|(AND|OR|NOT) (.*?(?= AND|OR|NOT))|(AND|OR|NOT) .*?$

You might also use a negative lookahead to assert that the word characters after so not start with either one of the alternatives您还可以使用否定前瞻来断言 so 之后的单词字符不以任何一个备选方案开头

\b(AND|OR|NOT) ((?!AND|OR|NOT)\b\w+(?: (?!AND|OR|NOT)\w+)*)

Regex demo正则表达式演示

 const regex = /\b(AND|OR|NOT) ((??AND|OR|NOT)\b\w+(:? (;;AND|OR|NOT)\w+)*)/gm; const str = `lorem AND ipsum dolor OR fizz NOT buzz`; let m. let result = []. while ((m = regex,exec(str));== null) { result.push([m[1]; m[2]]); } console.log(result);

I'm uncapable of doing that with regexp but this is a super simple solution that could work.我无法使用正则表达式来做到这一点,但这是一个可以工作的超级简单的解决方案。

let q = 'lorem AND ipsum dolor OR fizz NOT buzz';
let special = ['AND', 'OR', 'NOT'];

let fullResult = [];
let skip = true;

q.split(' ').forEach( word => {
    if ( special.indexOf(word) !== -1 ) {
        fullResult.push([word]);
        skip = false;
    } else if (!skip){
        fullResult[fullResult.length-1].push(word);
    }
});

console.log(fullResult);

I don't htink you can get there purely with regular expressions in JavaScript, but you can get awfully close:我不认为你可以在 JavaScript 中纯粹使用正则表达式到达那里,但你可以非常接近:

 const query = 'lorem AND ipsum dolor OR fizz NOT buzz'; const rex = /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig; const result = [...query.matchAll(rex)].map(([_, op, text]) => [op, text]); console.log(result);

The regex /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig looks for:正则表达式/\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig寻找:

  • A word break断言
  • One of your operators (capturing it)您的一名操作员(捕获它)
  • A word break断言
  • Zero or more whitespace chars (capturing them)零个或多个空白字符(捕获它们)
  • A non-greedy match for anything任何事物的非贪婪匹配
  • Zero or more whitespace chars零个或多个空白字符
  • Either another operator or the end of the string另一个运算符或字符串的末尾

The map after the matchAll call is just there to remove the initial array entry (the one with the full text of the match). map调用之后的matchAll只是用于删除初始数组条目(具有匹配全文的条目)。 I've done it with destructuring, but you could use slice instead:我已经用解构完成了,但你可以改用slice

const result = [...query.matchAll(rex)].map(match => match.slice(1));

 const query = 'lorem AND ipsum dolor OR fizz NOT buzz'; const rex = /\b(AND|OR|NOT|NEAR)\b\s*(.*?)\s*(?=$|\b(?:AND|OR|NOT|NEAR)\b)/ig; const result = [...query.matchAll(rex)].map(match => match.slice(1)); console.log(result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM