简体   繁体   English

从 JavaScript 字符串中删除零宽度空格字符

[英]Remove zero-width space characters from a JavaScript string

I take user-input (JS code) and execute (process) them in realtime to show some output.我获取用户输入(JS 代码)并实时执行(处理)它们以显示一些输出。

Sometimes the code has those zero-width spaces;有时代码有那些零宽度空格; it's really weird.这真的很奇怪。 I don't know how the users are inputting that.我不知道用户是如何输入的。 Example: "(​$".length === 3示例: "(​$".length === 3

I need to be able to remove that character from my code in JS.我需要能够从我的 JS 代码中删除该字符。 How do I do so?我该怎么做? or maybe there's some other way to execute that JS code so that the browser doesn't take the zero-width space characters into account?或者也许有其他方法来执行该 JS 代码,以便浏览器不考虑零宽度空格字符?

Unicode has the following zero-width characters: Unicode 具有以下零宽度字符:

  • U+200B zero width space U+200B 零宽度空间
  • U+200C zero width non-joiner Unicode code point U+200C 零宽度非连接器 Unicode 代码点
  • U+200D zero width joiner Unicode code point U+200D 零宽度连接器 Unicode 代码点
  • U+FEFF zero width no-break space Unicode code point U+FEFF 零宽度不间断空格 Unicode 代码点

To remove them from a string in JavaScript, you can use a simple regular expression:要从 JavaScript 中的字符串中删除它们,您可以使用一个简单的正则表达式:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe';
console.log(userInput.length); // 9
var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(result.length); // 5

Note that there are many more symbols that may not be visible.请注意,还有更多可能不可见的符号。 Some of ASCII's control characters , for example.例如,一些ASCII 的控制字符

I had a problem some invisible characters were corrupting my JSON and causing Unexpected Token ILLEGAL exception which was crashing my site.我遇到了一些不可见字符破坏了我的 JSON 并导致意外令牌非法异常的问题,这使我的网站崩溃。

Here is my solution using RegExp variable:这是我使用 RegExp 变量的解决方案:

    var re = new RegExp("\u2028|\u2029");
    var result = text.replace(re, '');

More about Javascript and zero width spaces you can find here: Zero Width Spaces您可以在此处找到有关 Javascript 和零宽度空间的更多信息: Zero Width Spaces

[].filter.call( str, function( c ) {
    return c.charCodeAt( 0 ) !== 8203;
} );

Filter each character to remove the 8203 char code (zero-width space unicode number).过滤每个字符以删除 8203 字符代码(零宽度空间 unicode 数字)。

str.replace(/\u200B/g,'');

200B 是零宽度空格 8203 的十六进制。用空字符串替换它以去除它

If you are trying to do this in JavaScript, try this regex .如果您尝试在 JavaScript 中执行此操作,请尝试使用此正则表达式

/([\u200B]+|[\u200C]+|[\u200D]+|[\u200E]+|[\u200F]+|[\uFEFF]+)/g

 submit.onclick = evt => { const stringToTrim = stringValue.value; zeroWidthTrim(stringToTrim); } /** * Given a string, when it has zero-width spaces in it, then remove them * * @param {String} stringToTrim The string to be trimmed of unicode spaces * * @return the trimmed string * * Regex for zero-width space Unicode characters. * * U+200B zero-width space. * U+200C zero-width non-joiner. * U+200D zero-width joiner. * U+200E left-to-right mark. * U+200F right-to-left mark. * U+FEFF zero-width non-breaking space. */ function zeroWidthTrim(stringToTrim) { const ZERO_WIDTH_SPACES_REGEX = /([\​]+|[\‌]+|[\‍]+|[\‎]+|[\‏]+|[\]+)/g; console.log('stringToTrim = ' + stringToTrim); const trimmedString = stringToTrim.replace(ZERO_WIDTH_SPACES_REGEX, ''); console.log('trimmedString = ' + trimmedString); return trimmedString; };
 <form runat="server"> <input name="stringValue" id="stringValue" type="text" placeholder="enter your string" value="[&#x200b;&#x200c;]" /> <input type="button" value="remove zero-width characters" id="submit" /> </form>

(Once you have run the above code snippet, paste the stringToTrim value and the trimmedString value into the regex101 test window and you will see that the Unicode character has gone from the trimmedString value.) (运行上述代码片段后,将stringToTrim值和trimmedString值粘贴stringToTrim regex101 测试窗口中,您将看到 Unicode 字符已从trimmedString值中消失。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM