简体   繁体   English

如何从javascript中的字符串中剥离(或正则表达式匹配)unicode字符?

[英]how to strip (or regex match) a unicode character from a string in javascript?

A website I'm modding with a userscript has some text I want to modify. 我正在使用用户脚本修改的网站上有一些我想修改的文本。 The text appears to have a unicode character in it. 文本中似乎包含Unicode字符。 When I look at it on screen or even extract it to a variable with jQuery, it looks like this: 当我在屏幕上查看它甚至使用jQuery将其提取为变量时,它看起来像这样:

2 others

However, if I create my own variable with that same text and then do a comparison, they come up as false. 但是,如果我用相同的文本创建自己的变量,然后进行比较,则它们将显示为false。 So I copied/pasted the site's text into vim and it looks like this: 所以我将站点的文本复制/粘贴到了vim中,它看起来像这样:

2<200e> others

Best I can tell this is a unicode character for space (?). 最好的说来,这是一个空格(?)的unicode字符。 I want to be able to match this string with a regex such as: 我希望能够将此字符串与正则表达式匹配,例如:

^(\\d+(?:,\\d+)*)\\s+(.*)

but on this string with the embedded unicode character it fails. 但是在带有嵌入的unicode字符的字符串上它失败。 (it works fine on my own typed text of '2 others'). (在我自己输入的“ 2个其他”文本上效果很好)。

Is there some way I can strip this unicode out of the text? 有什么办法可以将unicode从文本中剥离出来? I tried the following, to no avail: 我尝试了以下操作,但无济于事:

text.replace('\‎\\','')

text.replace('200e','')

text.replace('\\%20','')

text.replace('\\%u200e','')

Or, alternatively, can I adjust my regex to match either '2 others' or the same text with the embedded 200e unicode char? 或者,可以将我的正则表达式调整为与嵌入的200e unichar字符匹配“ 2个其他”或相同文本吗?

Try to use an actual regex instead. 尝试改用实际的正则表达式。

text = text.replace(/\u200e/g, '');

can I adjust my regex to match either '2 others' or the same text with the embedded 200e unicode char? 如何调整我的正则表达式以使其与嵌入的200e unicode字符匹配“ 2个其他”或相同文本?

You could just change the \\s in your regex to include U+200E as well, eg 您可以只更改正则表达式中的\\s使其也包含U + 200E,例如

^(\d+(?:,\d+)*)[\s\u200e]+(.*)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM