[英]Splitting a string containing the UTF-8 character “叱”
Running the following code in node 14.3.0在节点 14.3.0 中运行以下代码
const data = 'ABCDE𠮟漢字でも大丈夫';
console.log(data);
console.log(data.split(''));
returns返回
ABCDE𠮟漢字でも大丈夫
[
'A', 'B', 'C', 'D',
'E', '�', '�', '漢',
'字', 'で', 'も', '大',
'丈', '夫'
]
Why is the 叱 character not being split properly?为什么叱字符没有被正确分割? I have tested all jouyou kanji , and this character is the only one that yields this result.我已经测试了所有的jouyou kanji ,并且这个字符是唯一产生这个结果的字符。
Javascript split, came before UTF-8 was widely adopted,. Javascript 拆分,在 UTF-8 被广泛采用之前出现。 But to prevent breaking existing applications, it was decided not to alter it's implementation that was based on UTF-16.但是为了防止破坏现有的应用程序,决定不改变它基于 UTF-16 的实现。 Luckily, recent ES Specs implemented the Array.from
for coping with this.幸运的是,最近的 ES 规范实现了Array.from
来解决这个问题。
So for your example you can use Array.from
or the Array spread syntax..因此,对于您的示例,您可以使用Array.from
或 Array spread syntax..
Also RegEx also has the /u option for unicode.此外,RegEx 还具有 unicode 的 /u 选项。 I've also include that..我也包括那个..
eg.例如。
const data = 'ABCDE漢字でも大丈夫'; console.log([...data]); console.log(data.match(/.{1}/ug));
Use "叱" instead "用“叱”代替“ ", it`s the same characters. ",是同一个字符。
const data = 'ABCDE叱漢字でも大丈夫'; console.log(data); console.log(data.split(''));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.