拆分包含 UTF-8 字符“叱”的字符串

Question

Running the following code in node 14.3.0在节点 14.3.0 中运行以下代码

const data = 'ABCDE𠮟漢字でも大丈夫';
console.log(data);
console.log(data.split(''));

returns返回

ABCDE𠮟漢字でも大丈夫
[
  'A',  'B',  'C',  'D',
  'E',  '�',  '�',  '漢',
  '字', 'で', 'も', '大',
  '丈', '夫'
]

Why is the 叱 character not being split properly?为什么叱字符没有被正确分割？ I have tested all jouyou kanji , and this character is the only one that yields this result.我已经测试了所有的jouyou kanji ，并且这个字符是唯一产生这个结果的字符。

Answer 1

Javascript split, came before UTF-8 was widely adopted,. Javascript 拆分，在 UTF-8 被广泛采用之前出现。 But to prevent breaking existing applications, it was decided not to alter it's implementation that was based on UTF-16.但是为了防止破坏现有的应用程序，决定不改变它基于 UTF-16 的实现。 Luckily, recent ES Specs implemented the Array.from for coping with this.幸运的是，最近的 ES 规范实现了Array.from来解决这个问题。

So for your example you can use Array.from or the Array spread syntax..因此，对于您的示例，您可以使用Array.from或 Array spread syntax..

Also RegEx also has the /u option for unicode.此外，RegEx 还具有 unicode 的 /u 选项。 I've also include that..我也包括那个..

eg.例如。

 const data = 'ABCDE漢字でも大丈夫'; console.log([...data]); console.log(data.match(/.{1}/ug));

Answer 2

Use "叱" instead "用“叱”代替“ ", it`s the same characters. "，是同一个字符。

 const data = 'ABCDE叱漢字でも大丈夫'; console.log(data); console.log(data.split(''));

拆分包含 UTF-8 字符“叱”的字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-06-10 13:46:08

解决方案2
1 2020-06-10 13:48:50

拆分包含 UTF-8 字符“叱”的字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-06-10 13:46:08

解决方案2 1 2020-06-10 13:48:50

解决方案1
1 已采纳 2020-06-10 13:46:08

解决方案2
1 2020-06-10 13:48:50