简体   繁体   English

拆分包含 UTF-8 字符“叱”的字符串

[英]Splitting a string containing the UTF-8 character “叱”

Running the following code in node 14.3.0在节点 14.3.0 中运行以下代码

const data = 'ABCDE𠮟漢字でも大丈夫';
console.log(data);
console.log(data.split(''));

returns返回

ABCDE𠮟漢字でも大丈夫
[
  'A',  'B',  'C',  'D',
  'E',  '�',  '�',  '漢',
  '字', 'で', 'も', '大',
  '丈', '夫'
]

Why is the 叱 character not being split properly?为什么叱字符没有被正确分割? I have tested all jouyou kanji , and this character is the only one that yields this result.我已经测试了所有的jouyou kanji ,并且这个字符是唯一产生这个结果的字符。

Javascript split, came before UTF-8 was widely adopted,. Javascript 拆分,在 UTF-8 被广泛采用之前出现。 But to prevent breaking existing applications, it was decided not to alter it's implementation that was based on UTF-16.但是为了防止破坏现有的应用程序,决定不改变它基于 UTF-16 的实现。 Luckily, recent ES Specs implemented the Array.from for coping with this.幸运的是,最近的 ES 规范实现了Array.from来解决这个问题。

So for your example you can use Array.from or the Array spread syntax..因此,对于您的示例,您可以使用Array.from或 Array spread syntax..

Also RegEx also has the /u option for unicode.此外,RegEx 还具有 unicode 的 /u 选项。 I've also include that..我也包括那个..

eg.例如。

 const data = 'ABCDE漢字でも大丈夫'; console.log([...data]); console.log(data.match(/.{1}/ug));

Use "" instead "用“”代替“ ", it`s the same characters. ",是同一个字符。

 const data = 'ABCDE叱漢字でも大丈夫'; console.log(data); console.log(data.split(''));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM