我如何用java中的unicode替換字符串中的每個表情符號？

Question

我有一個這樣的字符串：

"\"title\":\"👺TEST title value 😁\",\"text\":\"💖 TEST text value.\"" ...

我想用它們的 unicode 值替換每個表情符號，如下所示：

"\"title\":\"U+1F47ATEST title value U+1F601\",\"text\":\"U+1F496 TEST text value.\"" ...

在網上搜索了很多之后，我找到了一種使用以下代碼將一個符號“翻譯”為其 unicode 的方法：

String s = "👺";
int emoji = Character.codePointAt(s, 0); 
String unumber = "U+" + Integer.toHexString(emoji).toUpperCase();

但是現在如何更改我的代碼以獲取字符串中的所有表情符號？

Ps 可以是 \\Uxxxxx 或 U+xxxxx 格式

Answer 1

試試這個解決方案：

String s = "your string with emoji";

StringBuilder sb = new StringBuilder();

for (int i = 0; i < s.length(); i++) {
  if (Character.isSurrogate(s.charAt(i))) {
    Integer res = Character.codePointAt(s, i);
    i++;
    sb.append("U+" + Integer.toHexString(res).toUpperCase());
  } else {
    sb.append(s.charAt(i));
  }
}

//result
System.out.println(sb.toString());

Answer 2

表情符號分散在不同的unicode 塊中。 例如👺(0x1F47A) 和💖(0x1F496) 來自雜項符號和象形文字，而😁(0x1F601) 來自表情符號

如果要過濾掉符號，則需要決定要使用哪些 unicode 塊（或它們的范圍）。 例如：

    String s = "\"title\":\"👺TEST title value 😁\",\"text\":\"💖 TEST text value.\"";
    StringBuilder sb = new StringBuilder();
    for (int i = 0, l = s.length() ; i < l ; i++) {
      char ch = s.charAt(i);
      if (Character.isHighSurrogate(ch)) {
        i++;
        char ch2 = s.charAt(i); // Load low surrogate
        int codePoint = Character.toCodePoint(ch, ch2);
        if ((codePoint >= 0x1F300) && (codePoint <= 0x1F64F)) { // Miscellaneous Symbols and Pictographs + Emoticons
          sb.append("U+").append(Integer.toHexString(codePoint).toUpperCase());
        } else { // otherwise just add characters as is
          sb.append(ch);
          sb.append(ch2);
        }
      } else { // if not a surrogate, just add the character
        sb.append(ch);
      }
    }
    String result = sb.toString();
    System.out.println(result); // "title":"U+1F47ATEST title value U+1F601","text":"U+1F496 TEST text value."

要僅獲取表情符號，您可以使用例如此列表來縮小條件范圍

但是如果你想轉義任何代理符號，你可以在代碼中去掉codePoint檢查

Answer 3

在您的代碼中，您不需要指定任何代碼點范圍，也不需要擔心代理。 相反，只需指定您希望字符以 Unicode 轉義形式呈現的 Unicode 塊。 這是通過使用Character.UnicodeBlock類中的字段聲明來實現的。 例如，判斷😁(0x1F601) 是否是表情符號：

boolean emoticon = Character.UnicodeBlock.EMOTICONS.equals(Character.UnicodeBlock.of("😁".codePointAt(0)));
System.out.println("Is 😁 an emoticon? " + emoticon); // Prints true.

這是通用代碼。 它將處理任何String ，如果它們在指定的 Unicode 代碼塊中定義，則將單個字符顯示為它們的 Unicode 等效項：

package symbolstounicode;

import java.util.List;
import java.util.stream.Collectors;

public class SymbolsToUnicode {

    public static void main(String[] args) {

        Character.UnicodeBlock[] blocksToConvert = new Character.UnicodeBlock[]{
            Character.UnicodeBlock.EMOTICONS, 
            Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS};
        String input = "\"title\":\"👺TEST title value 😁\",\"text\":\"💖 TEST text value.\"";
        String output = SymbolsToUnicode.toUnicode(input, blocksToConvert);

        System.out.println("String to convert: " + input);
        System.out.println("Converted string: " + output);
        assert ("\"title\":\"U+1F47ATEST title value U+1F601\",\"text\":\"U+1F496 TEST text value.\"".equals(output));
    }

    // Converts characters in the supplied string found in the specified list of UnicodeBlocks to their Unicode equivalents.
    static String toUnicode(String s, final Character.UnicodeBlock[] blocks) {

        StringBuilder sb = new StringBuilder("");
        List<Integer> cpList = s.codePoints().boxed().collect(Collectors.toList());

        cpList.forEach(cp -> sb.append(SymbolsToUnicode.inCodeBlock(cp, blocks) ? 
                "U+" + Integer.toHexString(cp).toUpperCase() : Character.toString(cp)));
        return sb.toString();
    }

    // Returns true if the supplied code point is within one of the specified UnicodeBlocks.
    static boolean inCodeBlock(final int cp, final Character.UnicodeBlock[] blocksToConvert) {

        for (Character.UnicodeBlock b : blocksToConvert) {
            if (b.equals(Character.UnicodeBlock.of(cp))) {
                return true;
            }
        }
        return false;
    }
}

這是輸出，使用 OP 中的測試數據：

run:
String to convert: "title":"👺TEST title value 😁","text":"💖 TEST text value."
Converted string: "title":"U+1F47ATEST title value U+1F601","text":"U+1F496 TEST text value."
BUILD SUCCESSFUL (total time: 0 seconds)

筆記：

我使用字體Segoe UI Symbol作為代碼和輸出窗口來正確呈現符號。
代碼中的基本思想是：
- 首先，指定要轉換的String ，以及需要將哪些字符轉換為 Unicode 的 Unicode 代碼塊。
- 接下來，使用String.codePoints()將String轉換為一組代碼點，並將它們存儲在List 。
- 最后，對於每個代碼點，確定它是否存在於任何指定的 Unicode 塊中，並在必要時對其進行轉換。

我如何用java中的unicode替換字符串中的每個表情符號？

問題描述

3 個解決方案

解決方案1
3 2019-12-17 15:34:21

解決方案2
1 2019-12-17 15:38:28

解決方案3
0 2019-12-20 08:44:54

我如何用java中的unicode替換字符串中的每個表情符號？

問題描述

3 個解決方案

解決方案1 3 2019-12-17 15:34:21

解決方案2 1 2019-12-17 15:38:28

解決方案3 0 2019-12-20 08:44:54

解決方案1
3 2019-12-17 15:34:21

解決方案2
1 2019-12-17 15:38:28

解決方案3
0 2019-12-20 08:44:54