简体   繁体   中英

javascript regular expression to replace special characters, but allow a whitelist, using xregexp

I want to replace most special characters from a string (in javascript), but allow some special cases, like c++, c# and more. I have experimented with the xregexp library in node.js and I am able to remove all non letters and numbers, I think. I would also like to allow all foreign language letters. This is what I have so far:

  var str = "I do programming in c++ and sometimes c#, but + and # should be removed";
  regex = XRegExp('[^\\s\\p{N}\\p{L}]+', 'g');
  var replaced = XRegExp.replace(str, regex, "");
  console.log(replaced); 

This outputs

I do programming in c and sometimes c, but and should be removed

I need to create some kind of list with allowed words, like c++ and c#. Desired output is:

I do programming in c++ and sometimes c#, but and should be removed

You can just use alternations inside a capturing group and then restore this text with a backreference in the replacement pattern:

 var str = "I do programming in c++ and sometimes c#, but + and # should be removed"; regex = XRegExp('(\\\\b(?:c[+]{2}|c#)(?!\\\\w))|[^\\\\s\\\\p{N}\\\\p{L}]+', 'ig'); // ^-- capture group 1 -----^ ^ var replaced = XRegExp.replace(str, regex, "$1"); // ^^ console.log(replaced); 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script> 

Note I added an i flag to make the pattern case insensitive, \\b in the beginning of the alternations to only match at the word boundary (since c++ and c# start with a letter (word character), and the lookahead (?!\\w) that makes sure there is no word character after + and # ( \\b would not work here as these are not word characters).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM