简体   繁体   English

从正则表达式匹配中排除单词数组

[英]Exclude array of words from regex match in Javascript

I am given two variables, a string [var str] and an array of words [var exceptions]. 我得到了两个变量,一个字符串[var str]和一个单词数组[var exceptions]。 I am replacing every middle character of every word longer than 3 letters with an asterisk using the following regular expression: 我使用以下正则表达式将每个超过3个字母的单词的每个中间字符替换为星号:

 var edited = str.replace(/\B\w\B/g, '*');

For example, the string "This is an example of what I am doing" would be returned as "T**s is an e*****e of w**t I am d***g" . 例如,将返回字符串“这是我在做什么的示例” ,因为“ T ** s是w ** t我是d *** g的e ***** e”

However, I would like to add exceptions to this regular expression. 但是,我想在此正则表达式中添加例外。 So for example, I am given the array ( var exceptions = ["example","doing"] ), then I would like the regex to return: "T**s is an example of w**t I am doing" 因此,例如,给我数组( var exceptions = ["example","doing"] ),然后我希望正则表达式返回: “ T ** s是我正在做的w ** t的示例”

Does anyone know how to do this? 有谁知道如何做到这一点? If there is a way to achieve this using regex great, if not I am open to other suggestions. 如果有一种方法可以很好地使用正则表达式,如果没有,我愿意接受其他建议。

Many thanks :) 非常感谢 :)

You may use the exception words - I see they all consist of word chars - as an alternation group and capture it into Group 1 and then restore them inside a replace callback. 您可以将异常词(我看到它们全部由字符char组成)用作替代组,并将其捕获到组1中,然后在replace回调中将其还原。

The regex will look like 正则表达式看起来像

/\b(example|doing)\b|\B\w\B/g

See the JS demo: 参见JS演示:

 var exceptions = ["example","doing"]; var rx = new RegExp("\\\\b(" + exceptions.join("|") + ")\\\\b|\\\\B\\\\w\\\\B", "g"); var s = "This is an example of what I am doing"; var res = s.replace(rx, function ($0, $1) { return $1 ? $1 : '*'; }); console.log(res); 

Pattern details : 图案细节

  • \\b(example|doing)\\b - match a whole word example or doing and place into capturing group #1 to be restores in the result later \\b(example|doing)\\b匹配整个单词exampledoing并放入捕获组#1中,以便稍后在结果中还原
  • | - or - 要么
  • \\B\\w\\B - match a word char inside other word chars (from [a-zA-Z0-9_] set). \\B\\w\\B将一个单词char与其他单词char匹配(来自[a-zA-Z0-9_]集合)。

Split the sentence in separate words with .split(" ") . .split(" ")将句子拆分成单独的单词。 Then for each word, check if it is in the array of exceptions, if it is not, just add it to the newString without changes. 然后,对于每个单词,检查它是否在异常数组中,如果不是,则将其添加到newString中而不进行任何更改。 If it is not, apply your regex. 如果不是,请应用正则表达式。

  var newString = ""; var exceptions = ["test"]; "this is a test".split(" ").forEach(word =>{ if(exceptions.includes(word)) newString += word + " "; else newString += word.replace(/\\B\\w\\B/g, '*') + " "; }); console.log(newString) 

I'd probably turn the array of excludes into a map so that I benefit from faster checking if a word is in the array. 我可能会将排除对象的数组转换为映射,以便更快地检查单词是否在数组中,从而从中受益。 Then I'd use the fact that the replace function accepts a function for the replacement, and make the decision in there: 然后,我将使用replace函数接受replace函数的事实,并在其中做出决定:

 var exclude = ["example", "what"]; var str = "This is an example of what I am doing"; var map = Object.create(null); exclude.forEach(function(entry) { map[entry] = true; }); var edited = str.replace(/\\b(\\w)(\\w+)(\\w)\\b/g, function(m, c0, c1, c2) { return map[m] ? m : c0 + "*".repeat(c1.length) + c2; }); console.log(edited); 

I've used String#repeat in the above, which is from ES2015, but can be easily shimmed for older browsers. 我在上面使用了来自ES2015的String#repeat ,但是对于较旧的浏览器来说,它很容易被填充。 Or use c1.replace(/./g, "*") instead. 或改用c1.replace(/./g, "*")


Here's an ES2015+ version, using Set rather than an object map: 这是ES2015 +版本,使用Set而不是对象映射:

 let exclude = ["example", "what"]; let str = "This is an example of what I am doing"; let set = new Set(); exclude.forEach(entry => { set.add(entry); }); let edited = str.replace(/\\b(\\w)(\\w+)(\\w)\\b/g, (m, c0, c1, c2) => set.has(m) ? m : c0 + "*".repeat(c1.length) + c2 ); console.log(edited); 

You could do it this way, assuming that words are always separated by spaces exclusively: 您可以通过这种方式进行此操作,假设单词始终始终由空格分隔:

 var str = "This is an example of what I am doing"; var exceptions = [ "example", "doing" ]; var edited = str.split(' ').map(function(w) { return exceptions.indexOf(w) != -1 ? w : w.replace(/\\B\\w\\B/g, '*'); }).join(' '); console.log(edited); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM