Splitting a string containing the UTF-8 character “叱”

Question

Running the following code in node 14.3.0

const data = 'ABCDE𠮟漢字でも大丈夫';
console.log(data);
console.log(data.split(''));

returns

ABCDE𠮟漢字でも大丈夫
[
  'A',  'B',  'C',  'D',
  'E',  '�',  '�',  '漢',
  '字', 'で', 'も', '大',
  '丈', '夫'
]

Why is the 叱 character not being split properly? I have tested all jouyou kanji , and this character is the only one that yields this result.

Answer 1

Javascript split, came before UTF-8 was widely adopted,. But to prevent breaking existing applications, it was decided not to alter it's implementation that was based on UTF-16. Luckily, recent ES Specs implemented the Array.from for coping with this.

So for your example you can use Array.from or the Array spread syntax..

Also RegEx also has the /u option for unicode. I've also include that..

eg.

 const data = 'ABCDE漢字でも大丈夫'; console.log([...data]); console.log(data.match(/.{1}/ug));

Answer 2

Use "叱" instead " ", it`s the same characters.

 const data = 'ABCDE叱漢字でも大丈夫'; console.log(data); console.log(data.split(''));

Splitting a string containing the UTF-8 character “叱”

Question

2 answers

solution1
1 ACCPTED 2020-06-10 13:46:08

solution2
1 2020-06-10 13:48:50

Splitting a string containing the UTF-8 character “叱”

Question

2 answers

solution1 1 ACCPTED 2020-06-10 13:46:08

solution2 1 2020-06-10 13:48:50

solution1
1 ACCPTED 2020-06-10 13:46:08

solution2
1 2020-06-10 13:48:50