简体   繁体   English

JavaScript从字符串中删除ZERO WIDTH SPACE(unicode 8203)

[英]JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string

I'm writing some javascript that processes website content. 我正在写一些处理网站内容的JavaScript。 My efforts are being thwarted by SharePoint text editor's tendency to put the "zero width space" character in the text when the user presses backspace. 当用户按下退格键时,SharePoint文本编辑器倾向于在文本中放置“零宽度空格”字符,这使我的努力受挫。 The character's unicode value is 8203, or B200 in hexadecimal. 字符的unicode值为8203,或十六进制的B200。 I've tried to use the default "replace" function to get rid of it. 我试图使用默认的“替换”功能来摆脱它。 I've tried many variants, none of them worked: 我尝试了许多变体,但没有一个起作用:

var a = "o​m"; //the invisible character is between o and m

var b = a.replace(/\u8203/g,'');
= a.replace(/\uB200/g,'');
= a.replace("\\uB200",'');

and so on and so forth. 等等等等。 I've tried quite a few variations on this theme. 我已经在这个主题上尝试了很多变体。 None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression: 这些表达式均无效(已在Chrome和Firefox中测试),唯一有效的方法是在表达式中键入实际字符:

var b = a.replace("​",''); //it's there, believe me

This poses potential problems. 这带来了潜在的问题。 The character is invisible so that line in itself doesn't make sense. 该字符是不可见的,因此线条本身没有意义。 I can get around that with comments. 我可以通过评论解决它。 But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. 但是,如果曾经重复使用代码,并且使用非Unicode编码保存了文件,(或者将其部署到SharePoint时,不能保证它不会弄乱编码),它将停止工作。 Is there a way to write this using the unicode notation instead of the character itself? 有没有办法使用unicode标记而不是字符本身来编写此代码?

[My ramblings about the character] [我对角色的无聊]

In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. 如果您还没有遇到这个角色,(并且您可能还没有看到过,因为它肉眼看不见,除非它破坏了代码并且您在尝试查找错误时发现了它),这确实是一个漏洞会导致某些类型的图案匹配发生故障。 I've caged the beast for you: 我为你关上了野兽:

[​] <- careful, don't let it escape. [-<-小心,不要让它逃脱。

If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. 如果要查看它,请将这些括号复制到文本编辑器中,然后遍历光标。 You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle. 您会注意到您将需要三个步骤来传递看起来像2个字符的字符,并且光标将在中间跳过一个步骤。

The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space ), so: unicode转义中的数字应为十六进制,而8203的十六进制为200B(这实际上是Unicode零宽度空格 ),因此:

var b = a.replace(/\u200B/g,'');

Live Example : 现场示例

var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/\u200B/g,'');
console.log("a.length = " + a.length);      // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length);      // 2
console.log("b === 'om'? " + (b === 'om')); // true

The accepted answer didn't work for my case. 可接受的答案不适用于我的情况。

But this one did: 但这确实做到了:

text.replace(/(^[\s\u200b]*|[\s\u200b]*$)/g, '')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM