Running the following code in node 14.3.0
const data = 'ABCDE𠮟漢字でも大丈夫';
console.log(data);
console.log(data.split(''));
returns
ABCDE𠮟漢字でも大丈夫
[
'A', 'B', 'C', 'D',
'E', '�', '�', '漢',
'字', 'で', 'も', '大',
'丈', '夫'
]
Why is the 叱 character not being split properly? I have tested all jouyou kanji , and this character is the only one that yields this result.
Javascript split, came before UTF-8 was widely adopted,. But to prevent breaking existing applications, it was decided not to alter it's implementation that was based on UTF-16. Luckily, recent ES Specs implemented the Array.from
for coping with this.
So for your example you can use Array.from
or the Array spread syntax..
Also RegEx also has the /u option for unicode. I've also include that..
eg.
const data = 'ABCDE漢字でも大丈夫'; console.log([...data]); console.log(data.match(/.{1}/ug));
Use "叱" instead " ", it`s the same characters.
const data = 'ABCDE叱漢字でも大丈夫'; console.log(data); console.log(data.split(''));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.