简体   繁体   中英

How can I split a Unicode string into multiple Unicode characters in C#?

If I have a string like "😀123👨‍👩‍👧‍👦" , how can I split it into an array, which would look like ["😀", "1", "2", "3", "👨‍👩‍👧‍👦"] ? If I use ToCharArray() the first Emoji is split into 2 characters and the second into 7 characters.

Update

The solution now looks like this:

public static List<string> GetCharacters(string text)
{
    char[] ca = text.ToCharArray();
    List<string> characters = new List<string>();
    for (int i = 0; i < ca.Length; i++)
    {
        char c = ca[i];
        if (c > ‭65535‬) continue;
        if (char.IsHighSurrogate(c))
        {
            i++;
            characters.Add(new string(new[] { c, ca[i] }));
        }
        else
            characters.Add(new string(new[] { c }));
    }
    return characters;
}

Please note that, as mentioned in the comments, it doesn't work for the family emoji. It only works for emojis that have 2 characters or less. The output of the example would be: ["😀", "1", "2", "3", "👨‍", "👩‍", "👧‍", "👦"]

.NET represents strings as a sequence of UTF-16 elements. Unicode code points outside the Base Multilingual Plane (BMP) will be split into a high and low surrogate. The lower 10 bits of each forms half of the real code point value.

There are helpers to detect these surrogates (eg. Char.IsLowSurrogate ).

You need to handle this yourself.

There is a solution which seems to work for the input you specified:

static string[] SplitIntoTextElements(string input)
{
    IEnumerable<string> Helper()
    {
        for (var en = StringInfo.GetTextElementEnumerator(input); en.MoveNext();)
            yield return en.GetTextElement();
    }
    return Helper().ToArray();
}

Try it here .


PS: This solution should work for .NET 5+, the previous .NET versions contain a bug which prevents the correct splitting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM