[英]regex to match all words but AND, OR and NOT
In my javascript app I have this random string: 在我的javascript应用程序中,我有这个随机字符串:
büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)
and i would like to match all words special chars and numbers besides the words AND
, OR
and NOT
. 我希望除了单词
AND
, OR
和NOT
之外,还要匹配所有单词的特殊字符和数字。
I tried is this 我试过这个
/(?!AND|OR|NOT)\\b[\À-\ſ\\w\\d]+/gi
which results in 结果
["büert", "3454jhadf", "asdfsdf", "technüology", "bar", "bas"]
but this one does not match the ü
or any other letter outside the az alphabet at the beginning or at the end of a word because of the \\b
word boundary. 但是这一次不匹配
ü
或AZ字母以外的其他任何字母开头或因为一个字的结尾\\b
字边界。
removing the \\b
oddly ends up matching part or the words i would like to exclude: 删除
\\b
奇怪地结束匹配部分或我想要排除的单词:
/(?!AND|OR|NOT)[\À-\ſ\\w\\d]+/gi
result is 结果是
["büert", "ND", "OT", "3454jhadf", "üasdfsdf", "R", "technüology", "ND", "bar", "R", "bas"]
what is the correct way to match all words no matter what type of characters they contain besides the ones i want exclude? 除了我想要排除的字符外,无论它们包含什么类型的字符,匹配所有单词的正确方法是什么?
The issue here has its roots in the fact that \\b
(and \\w
, and other shorthand classes) are not Unicode-aware in JavaScript. 这里的问题源于
\\b
(和\\w
,以及其他速记类)在JavaScript中不支持Unicode。
Now, there are 2 ways to achieve what you want. 现在,有两种方法可以达到你想要的效果。
var re = /\\s*\\b(?:AND|OR|NOT)\\b\\s*|[()]/; var s = "büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)"; var res = s.split(re).filter(Boolean); document.body.innerHTML += JSON.stringify(res, 0, 4); // = > [ "büert", "3454jhadf üasdfsdf", "technüology", "bar", "bas" ]
Note the use of a non-capturing group (?:...)
so as not to include the unwanted words into the resulting array. 请注意使用非捕获组
(?:...)
以便不将不需要的单词包含在结果数组中。 Also, you need to add all punctuation and other unwanted characters to the character class. 此外,您需要将所有标点符号和其他不需要的字符添加到字符类。
You can use groupings with anchors/reverse negated character class in a regex like this: 您可以在正则表达式中使用具有锚点/反向否定字符类的分组,如下所示:
(^|[^\u00C0-\u017F\w])(?!(?:AND|OR|NOT)(?=[^\u00C0-\u017F\w]|$))([\u00C0-\u017F\w]+)(?=[^\u00C0-\u017F\w]|$)
The capure group 2 will hold the values you need. 捕获组2将保留您需要的值。
See regex demo 请参阅正则表达式演示
JS code demo: JS代码演示:
var re = /(^|[^\À-\ſ\\w])(?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$))([\À-\ſ\\w]+)(?=[^\À-\ſ\\w]|$)/gi; var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)'; var m; var arr = []; while ((m = re.exec(str)) !== null) { arr.push(m[2]); } document.body.innerHTML += JSON.stringify(arr);
or with a block to build the regex dynamically: 或者使用块来动态构建正则表达式:
var bndry = "[^\\\À-\\\ſ\\\\w]"; var re = RegExp("(^|" + bndry + ")" + // starting boundary "(?!(?:AND|OR|NOT)(?=" + bndry + "|$))" + // restriction "([\\\À-\\\ſ\\\\w]+)" + // match and capture our string "(?=" + bndry + "|$)" // set trailing boundary , "g"); var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)'; var m, arr = []; while ((m = re.exec(str)) !== null) { arr.push(m[2]); } document.body.innerHTML += JSON.stringify(arr);
Explanation: 说明:
(^|[^\À-\ſ\\w])
- our custom boundary (match a string start with ^
or any character outside the [\À-\ſ\\w]
range) (^|[^\À-\ſ\\w])
- 我们的自定义边界(匹配字符串以^
开头或[\À-\ſ\\w]
范围之外的任何字符) (?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$))
- a restriction on the match: the match is failed if there are AND
or OR
or NOT
followed by string end or characters other than those in the \À-\ſ
range or non-word character (?!(?:AND|OR|NOT)(?=[^\À-\ſ\\w]|$))
- 对匹配的限制:如果存在AND
或OR
或NOT
AND
则匹配失败字符串结尾或\À-\ſ
范围或非单词字符以外的字符 ([\À-\ſ\\w]+)
- match word characters ( [a-zA-Z0-9_]
) or those from the \À-\ſ
range ([\À-\ſ\\w]+)
- 匹配单词字符( [a-zA-Z0-9_]
)或来自\À-\ſ
范围的\À-\ſ
(?=[^\À-\ſ\\w]|$)
- the trailing boundary, either string end ( $
) or characters other than those in the \À-\ſ
range or non-word character. (?=[^\À-\ſ\\w]|$)
- 尾部边界,字符串结尾( $
)或\À-\ſ
范围或非单词字符以外的字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.