如何從字符串中提取表情符號和字母字符

Question

我想從字符串中提取表情符號和字母字符到集合中，只是字符串具有任何類型的表情符號字符，例如活動，家庭，旗幟，動物符號，並且還具有字母字符。 當我從EditText獲得字符串時，它類似於“ AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭”。 我嘗試過，但不幸的是，獲得收集陣列不符合我的期望，所以有人可以建議我，我需要為預期的收集陣列做什么嗎？

如果我寫錯了，我使用Eclipse嘗試了這段代碼來糾正我

public class CodePoints {

    public static void main(String []args){
        List<String> list = new ArrayList<>();
        for(int codePoint : codePoints("AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭")) {
            list.add(String.valueOf(Character.toChars(codePoint)));
        }

        System.out.println(Arrays.toString(list.toArray()));
    }

    public static Iterable<Integer> codePoints(final String string) {
     return new Iterable<Integer>() {
       public Iterator<Integer> iterator() {
         return new Iterator<Integer>() {
           int nextIndex = 0;
           public boolean hasNext() {
             return nextIndex < string.length();
           }
           public Integer next() {
             int result = string.codePointAt(nextIndex);
             nextIndex += Character.charCount(result);
             return result;
           }
           public void remove() {
             throw new UnsupportedOperationException();
           }
         };
       }
     };
   }
}

輸出：
[A，B，😄，C，😊，D，👨，‍，👩，‍，👧，‍，👦，E，🏳，️，‍，🌈，‍，👭]

預期：
[A，B，😄，C，😊，D，👨‍👩‍👧‍👦，E，🏳️‍🌈‍，👭]

Answer 1

問題是您的字符串包含不可見的字符。
他們是：
Unicode字符'ZERO WIDTH JOINER'（U + 200D）
Unicode字符'VARIATION SELECTOR-16'（U + FE0F）
其他類似的是：
Unicode字符'SOFT HYPHEN'（U + 00AD）
...

Java字符是utf16編碼的，請參見： https ://en.wikipedia.org/wiki/UTF-16
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

這是一種迭代字符串中各個unicode字符的方法。

public static List<String> getUnicodeCharacters(String str) {
    List<String> result = new ArrayList<>();
    char charArray[] = str.toCharArray();
    for (int i = 0; i < charArray.length; ) {
        if (Character.isHighSurrogate(charArray[i])
                && (i + 1) < charArray.length
                && Character.isLowSurrogate(charArray[i + 1])) {
            result.add(new String(new char[]{charArray[i], charArray[i + 1]}));
            i += 2;
        } else {
            result.add(new String(new char[]{charArray[i]}));
            i++;
        }
    }
    return result;
}

@Test
void getUnicodeCharacters() {
    String str = "AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭";
    System.out.println(str.codePointCount(0, str.length()));
    for (String unicodeCharacter : UTF_16.getUnicodeCharacters(str)) {
        if ("\u200D".equals(unicodeCharacter)
                || "\uFE0F".equals(unicodeCharacter))
            continue;
        System.out.println(unicodeCharacter);
    }
}

如何從字符串中提取表情符號和字母字符

問題描述

1 個解決方案

解決方案1
0 2019-01-02 08:26:34

如何從字符串中提取表情符號和字母字符

問題描述

1 個解決方案

解決方案1 0 2019-01-02 08:26:34

解決方案1
0 2019-01-02 08:26:34