简体   繁体   English

正则表达式匹配除AND,OR和NOT之外的所有单词

[英]regex to match all words but AND, OR and NOT

In my javascript app I have this random string: 在我的javascript应用程序中,我有这个随机字符串:

büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)

and i would like to match all words special chars and numbers besides the words AND , OR and NOT . 我希望除了单词ANDORNOT之外,还要匹配所有单词的特殊字符和数字。

I tried is this 我试过这个

/(?!AND|OR|NOT)\\b[\À-\ſ\\w\\d]+/gi
which results in 结果
["büert", "3454jhadf", "asdfsdf", "technüology", "bar", "bas"]

but this one does not match the ü or any other letter outside the az alphabet at the beginning or at the end of a word because of the \\b word boundary. 但是这一次不匹配ü或AZ字母以外的其他任何字母开头或因为一个字的结尾\\b字边界。

removing the \\b oddly ends up matching part or the words i would like to exclude: 删除\\b奇怪地结束匹配部分或我想要排除的单词:

/(?!AND|OR|NOT)[\À-\ſ\\w\\d]+/gi
result is 结果是
["büert", "ND", "OT", "3454jhadf", "üasdfsdf", "R", "technüology", "ND", "bar", "R", "bas"]

what is the correct way to match all words no matter what type of characters they contain besides the ones i want exclude? 除了我想要排除的字符外,无论它们包含什么类型的字符,匹配所有单词的正确方法是什么?

The issue here has its roots in the fact that \\b (and \\w , and other shorthand classes) are not Unicode-aware in JavaScript. 这里的问题源于\\b (和\\w ,以及其他速记类)在JavaScript中不支持Unicode。

Now, there are 2 ways to achieve what you want. 现在,有两种方法可以达到你想要的效果。

1. SPLIT WITH PATTERN(S) YOU WANT TO DISCARD 1.分割你想要丢弃的图案

 var re = /\\s*\\b(?:AND|OR|NOT)\\b\\s*|[()]/; var s = "büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)"; var res = s.split(re).filter(Boolean); document.body.innerHTML += JSON.stringify(res, 0, 4); // = > [ "büert", "3454jhadf üasdfsdf", "technüology", "bar", "bas" ] 

Note the use of a non-capturing group (?:...) so as not to include the unwanted words into the resulting array. 请注意使用非捕获组(?:...)以便不将不需要的单词包含在结果数组中。 Also, you need to add all punctuation and other unwanted characters to the character class. 此外,您需要将所有标点符号和其他不需要的字符添加到字符类。

2. MATCH USING CUSTOM BOUNDARIES 2.使用自定义边界匹配

You can use groupings with anchors/reverse negated character class in a regex like this: 您可以在正则表达式中使用具有锚点/反向否定字符类的分组,如下所示:

(^|[^\u00C0-\u017F\w])(?!(?:AND|OR|NOT)(?=[^\u00C0-\u017F\w]|$))([\u00C0-\u017F\w]+)(?=[^\u00C0-\u017F\w]|$)

The capure group 2 will hold the values you need. 捕获组2将保留您需要的值。

See regex demo 请参阅正则表达式演示

JS code demo: JS代码演示:

 var re = /(^|[^\À-\ſ\\w])(?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$))([\À-\ſ\\w]+)(?=[^\À-\ſ\\w]|$)/gi; var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)'; var m; var arr = []; while ((m = re.exec(str)) !== null) { arr.push(m[2]); } document.body.innerHTML += JSON.stringify(arr); 

or with a block to build the regex dynamically: 或者使用块来动态构建正则表达式:

 var bndry = "[^\\\À-\\\ſ\\\\w]"; var re = RegExp("(^|" + bndry + ")" + // starting boundary "(?!(?:AND|OR|NOT)(?=" + bndry + "|$))" + // restriction "([\\\À-\\\ſ\\\\w]+)" + // match and capture our string "(?=" + bndry + "|$)" // set trailing boundary , "g"); var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)'; var m, arr = []; while ((m = re.exec(str)) !== null) { arr.push(m[2]); } document.body.innerHTML += JSON.stringify(arr); 

Explanation: 说明:

  • (^|[^\À-\ſ\\w]) - our custom boundary (match a string start with ^ or any character outside the [\À-\ſ\\w] range) (^|[^\À-\ſ\\w]) - 我们的自定义边界(匹配字符串以^开头或[\À-\ſ\\w]范围之外的任何字符)
  • (?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$)) - a restriction on the match: the match is failed if there are AND or OR or NOT followed by string end or characters other than those in the \À-\ſ range or non-word character (?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$)) - 对匹配的限制:如果存在ANDORNOT AND则匹配失败字符串结尾或\À-\ſ范围或非单词字符以外的字符
  • ([\À-\ſ\\w]+) - match word characters ( [a-zA-Z0-9_] ) or those from the \À-\ſ range ([\À-\ſ\\w]+) - 匹配单词字符( [a-zA-Z0-9_] )或来自\À-\ſ范围的\À-\ſ
  • (?=[^\À-\ſ\\w]|$) - the trailing boundary, either string end ( $ ) or characters other than those in the \À-\ſ range or non-word character. (?=[^\À-\ſ\\w]|$) - 尾部边界,字符串结尾( $ )或\À-\ſ范围或非单词字符以外的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM