简体   繁体   中英

JavaScript regular expression replace - why does one work, but this other not?

I grabbed the following JavaScript regular expression replace from another site to strip out some invalid characters:

str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');

However, I noticed it wasn't catching occurrences of \\00B7 (the ISO-8859-1 center dot character).

If I did it in two steps however, it works:

str = str.replace(/\u00B7/g,'');
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');

The 1st replace seems to be included in the 2nd replace. Can somebody explain to me why the 2nd line doesn't work all by itself. Thanks.

The first and second pattern are completely different. Pattern one replaces \·, while the second pattern replaces all characters NOT listed in the pattern. Remove the carat from pattern two and that should fix your issue.

Just to be clear:

/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/

matches all characters not in the set. So to match \· (and have it replaced with ''), remove it from the pattern:

/[^\u000D\u0020-\u007E\u00A2-\u00A4]/

The ASCII character set is given at http://www.asciitable.com/ , likely that is the set you want to keep. The range \ -\~ covers most the common set that is of interest, the others are typically not wanted.

\ is a carriage return, I would investigate whether you really need u00A2, u00A3 and u00A4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM