如何将 Unicode 字符串拆分为 JavaScript 中的字符

Question

For long time we used naive approach to split strings in JS:长期以来，我们使用幼稚的方法在 JS 中拆分字符串：

someString.split('');

But popularity of emoji forced us to change this approach - emoji characters (and other non-BMP characters) like are made of two "characters'.但是表情符号的流行迫使我们改变这种做法——表情符号字符（和其他非 BMP 字符）就像是由两个“字符”组成的。

String.fromCodePoint(128514).split(''); // array of 2 characters; can't embed due to StackOverflow limitations

So what is modern, correct and performant approach to this task?那么，什么是现代、正确和高效的方法来完成这项任务呢？

Answer 1

Using spread in array literal : 在数组文字中使用传播：

 const str = "🌍🤖😸🎉"; console.log([...str]);

Using for...of :使用for...of ：

 function split(str){ const arr = []; for(const char of str) arr.push(char) return arr; } const str = "🌍🤖😸🎉"; console.log(split(str));

Answer 2

The best approach to this task is to use native String.prototype[Symbol.iterator] that's aware of Unicode characters.此任务的最佳方法是使用可String.prototype[Symbol.iterator] Unicode 字符的原生String.prototype[Symbol.iterator] 。 Consequently clean and easy approach to split Unicode character is Array.from used on string, eg:因此，分割 Unicode 字符的Array.from方法是Array.from用于字符串，例如：

const string = String.fromCodePoint(128514, 32, 105, 32, 102, 101, 101, 108, 32, 128514, 32, 97, 109, 97, 122, 105, 110, 128514);
Array.from(string);

Answer 3

A flag was introduced in ECMA 2015 to support unicode awareness in regex. ECMA 2015 中引入了一个标志来支持正则表达式中的 unicode 感知。

Adding u to your regex returns the complete character in your result.将u添加到您的正则表达式会在您的结果中返回完整的字符。

 const withFlag = `AB😂DE`.match(/./ug); const withoutFlag = `AB😂DE`.match(/./g); console.log(withFlag, withoutFlag);

There's a little more about it here有一个小更多关于它在这里

Answer 4

I did something like this somewhere I had to support older browsers and a ES5 minifier, probably will be useful to other我在某个地方做了类似的事情，我必须支持旧浏览器和 ES5 缩小器，可能对其他人有用

    if (Array.from && window.Symbol && window.Symbol.iterator) {
        array = Array.from(input[window.Symbol.iterator]());
    } else {
        array = ...; // maybe `input.split('');` as fallback if it doesn't matter
    }

Answer 5

JavaScript has a new API (part of ES2023) called Intl.Segmenter that allows you to split strings based on graphemes (the user-perceived characters of a string). JavaScript 有一个名为Intl.Segmenter的新 API（ES2023 的一部分），允许您根据字形（字符串的用户感知字符）拆分字符串。 With this API, your split might look like so:使用此 API，您的拆分可能如下所示：

 const split = (str) => { const itr = new Intl.Segmenter("en", {granularity: 'grapheme'}).segment(str); return Array.from(itr, ({segment}) => segment); } // See browser console for output console.log(split('')); // [''] console.log(split('é')); // ['é'] console.log(split('')); // [''] console.log(split('❤️')); // ['❤️'] console.log(split('♀️')); // ['♀️']

 <p>See browser console for logs</p>

This allows you to not only deal with emojis consisting of two code points such as这使您不仅可以处理由两个代码点组成的表情符号，例如, but other characters also such as composite characters (eg: é ), characters separated by ZWJs (eg: , 但其他字符也如复合字符（例如： é ），由 ZWJ 分隔的字符（例如： ), characters with variation selectors (eg: ❤️), characters with emoji modifiers (eg: ♀️ ) etc. all of which can't be handled by invoking the iterator of strings (by using spread ... , for..of , Symbol.iterator etc.) as seen in the other answers, as these will only iterate the code points of your string. )，带有变体选择器的字符（例如：❤️），带有表情符号修饰符的字符（例如： ♀️ ）等。所有这些都无法通过调用字符串的迭代器（通过使用 spread ... ， for..of ， Symbol.iterator等），如其他答案所示，因为这些只会迭代字符串的代码点。

如何将 Unicode 字符串拆分为 JavaScript 中的字符

问题描述

5 个解决方案

解决方案1
12 2020-01-11 01:24:37

Using spread in array literal : 在数组文字中使用传播：

Using for...of :使用for...of ：

解决方案2
6 已采纳 2016-02-05 11:35:38

解决方案3
4 2020-06-27 11:03:05

解决方案4
0 2021-12-08 11:54:07

解决方案5
0 2022-09-21 14:25:19

如何将 Unicode 字符串拆分为 JavaScript 中的字符

问题描述

5 个解决方案

解决方案1 12 2020-01-11 01:24:37

Using spread in array literal : 在数组文字中使用传播：

Using for...of :使用for...of ：

解决方案2 6 已采纳 2016-02-05 11:35:38

解决方案3 4 2020-06-27 11:03:05

解决方案4 0 2021-12-08 11:54:07

解决方案5 0 2022-09-21 14:25:19

解决方案1
12 2020-01-11 01:24:37

解决方案2
6 已采纳 2016-02-05 11:35:38

解决方案3
4 2020-06-27 11:03:05

解决方案4
0 2021-12-08 11:54:07

解决方案5
0 2022-09-21 14:25:19