简体   繁体   English

将UTF-8 Unicode字符串转换为ASCII Unicode转义的字符串

[英]Convert UTF-8 Unicode string to ASCII Unicode escaped String

I need to convert unicode string to string which have non-ascii characters encoded in unicode. 我需要将unicode字符串转换为具有以unicode编码的非ascii字符的字符串。 For example, string "漢字 Max" should be presented as "\漢\字 Max". 例如,字符串“汉字Max”应显示为“ \\ u6F22 \\ u5B57 Max”。

What I have tried: 我试过的

  1. Differenct combinations of 的不同组合

    new String(sourceString.getBytes(encoding1), encoding2) 新的String(sourceString.getBytes(encoding1),encoding2)

  2. Apache StringEscapeUtils which escapes also ascii chars like double-quote Apache StringEscapeUtils也会转义ascii字符,例如双引号

    StringEscapeUtils.escapeJava(source) StringEscapeUtils.escapeJava(源代码)

Is there an easy way to encode such string? 有没有一种简单的方法来编码这样的字符串? Ideally only Java 6 SE or Apache Commons should be used to achieve desired result. 理想情况下,仅应使用Java 6 SE或Apache Commons来获得所需的结果。

This is the kind of simple code Jon Skeet had in mind in his comment: 这是乔恩·斯基特(Jon Skeet)在评论中想到的简单代码:

final String in = "šđčćasdf";
final StringBuilder out = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
  final char ch = in.charAt(i);
  if (ch <= 127) out.append(ch);
  else out.append("\\u").append(String.format("%04x", (int)ch));
}
System.out.println(out.toString());

As Jon said, surrogate pairs will be represented as a pair of \\u\u003c/code> escapes. 正如乔恩所说,代理对将被表示为一对\\u\u003c/code>转义。

Guava Escaper Based Solution: 基于番石榴逃逸者的解决方案:

This escapes any non-ASCII characters into Unicode escape sequences. 这会将所有非ASCII字符转义为Unicode转义序列。

import static java.lang.String.format;    
import com.google.common.escape.CharEscaper;

public class NonAsciiUnicodeEscaper extends CharEscaper
{
    @Override
    protected char[] escape(final char c)
    {
        if (c >= 32 && c <= 127) { return new char[]{c}; }
        else { return format("\\u%04x", (int) c).toCharArray(); }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM