简体   繁体   English

如何从字符串中提取表情符号和字母字符

[英]How to extract emoji and alphabet characters from the string

I want to extract emoji and alphabet characters from the string to a collection, simply string has any type of emoji character like activity, family, flag, animal symbols and also have alphabet characters. 我想从字符串中提取表情符号和字母字符到集合中,只是字符串具有任何类型的表情符号字符,例如活动,家庭,旗帜,动物符号,并且还具有字母字符。 when I got the string from EditText it is similar to "AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭". 当我从EditText获得字符串时,它类似于“ AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭”。 I tried but unfortunately getting collection array is not like my expectation so, can anyone suggest me, what I need to do for expected collection array? 我尝试过,但不幸的是,获得收集阵列不符合我的期望,所以有人可以建议我,我需要为预期的收集阵列做什么吗?

Using Eclipse I tried this piece of code correct me if I am wrong 如果我写错了,我使用Eclipse尝试了这段代码来纠正我

public class CodePoints {

    public static void main(String []args){
        List<String> list = new ArrayList<>();
        for(int codePoint : codePoints("AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭")) {
            list.add(String.valueOf(Character.toChars(codePoint)));
        }

        System.out.println(Arrays.toString(list.toArray()));
    }

    public static Iterable<Integer> codePoints(final String string) {
     return new Iterable<Integer>() {
       public Iterator<Integer> iterator() {
         return new Iterator<Integer>() {
           int nextIndex = 0;
           public boolean hasNext() {
             return nextIndex < string.length();
           }
           public Integer next() {
             int result = string.codePointAt(nextIndex);
             nextIndex += Character.charCount(result);
             return result;
           }
           public void remove() {
             throw new UnsupportedOperationException();
           }
         };
       }
     };
   }
}

Output: 输出:
[A, B, 😄, C, 😊, D, 👨, ‍, 👩, ‍, 👧, ‍, 👦, E, 🏳, ️, ‍, 🌈, ‍, 👭] [A,B,😄,C,😊,D,👨,‍,👩,‍,👧,‍,👦,E,🏳,️,‍,🌈,‍,👭]

Expected: 预期:
[A, B, 😄, C, 😊, D, 👨‍👩‍👧‍👦, E, 🏳️‍🌈‍, 👭] [A,B,😄,C,😊,D,👨‍👩‍👧‍👦,E,🏳️‍🌈‍,👭]

The problem is that your string contains invisible characters. 问题是您的字符串包含不可见的字符。
They are: 他们是:
Unicode Character 'ZERO WIDTH JOINER' (U+200D) Unicode字符'ZERO WIDTH JOINER'(U + 200D)
Unicode Character 'VARIATION SELECTOR-16' (U+FE0F) Unicode字符'VARIATION SELECTOR-16'(U + FE0F)
Other similar ones are: 其他类似的是:
Unicode Character 'SOFT HYPHEN' (U+00AD) Unicode字符'SOFT HYPHEN'(U + 00AD)
... ...

The java character is utf16 encoded, see: https://en.wikipedia.org/wiki/UTF-16 Java字符是utf16编码的,请参见: https ://en.wikipedia.org/wiki/UTF-16
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html https://docs.oracle.com/javase/7/docs/api/java/lang/String.html

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

This is a method of iterating individual unicode characters in a string. 这是一种迭代字符串中各个unicode字符的方法。

public static List<String> getUnicodeCharacters(String str) {
    List<String> result = new ArrayList<>();
    char charArray[] = str.toCharArray();
    for (int i = 0; i < charArray.length; ) {
        if (Character.isHighSurrogate(charArray[i])
                && (i + 1) < charArray.length
                && Character.isLowSurrogate(charArray[i + 1])) {
            result.add(new String(new char[]{charArray[i], charArray[i + 1]}));
            i += 2;
        } else {
            result.add(new String(new char[]{charArray[i]}));
            i++;
        }
    }
    return result;
}

@Test
void getUnicodeCharacters() {
    String str = "AB😄C😊D👨‍👩‍👧‍👦E🏳️‍🌈‍👭";
    System.out.println(str.codePointCount(0, str.length()));
    for (String unicodeCharacter : UTF_16.getUnicodeCharacters(str)) {
        if ("\u200D".equals(unicodeCharacter)
                || "\uFE0F".equals(unicodeCharacter))
            continue;
        System.out.println(unicodeCharacter);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM