[英]How can I convert these UTF-8 literals into character strings?
I have UTF-8 literals like this: 我有这样的UTF-8文字:
String literal = "\x6c\x69b/\x62\x2f\x6d\x69nd/m\x61x\x2e\x70h\x70";
I need to read them and convert them into plain text. 我需要阅读它们并将其转换为纯文本。
Is there an import in java that can interpret these? java中是否有可以解释这些内容的导入?
Thank you. 谢谢。
Java doesn't support UTF-8 literals per se. Java本身不支持UTF-8文字。 Java's linguistic support for Unicode is limited to UTF-16 based Unicode escapes.
Java对Unicode的语言支持仅限于基于UTF-16的Unicode转义。
You can express your UTF-8 characters in a String literal with Unicode escapes as follows: 您可以使用Unicode转义以字符串文字形式表示UTF-8字符,如下所示:
String literal =
"\u006c\u0069b/\u0062\u002f\u006d\u0069nd/m\u0061x\u002e\u0070h\u0070";
(Assuming no typing errors ...) (假设没有输入错误...)
or you could (in this case) replace the escapes with normal ASCII characters. 或者(在这种情况下)您可以将转义符替换为普通的ASCII字符。
Note that the conversion from UTF-8 to UTF16 is not normally that simple. 请注意,从UTF-8到UTF16的转换通常不是那么简单。 (It is simple in this case because the \\xnn characters are all less than 0x80, and therefore each one represents a single Unicode code point / unit.)
(在这种情况下很简单,因为\\ xnn字符都小于0x80,因此每个字符代表一个Unicode代码点/单位。)
Another approach is to represent the UTF-8 as an array of bytes, and convert that to a String; 另一种方法是将UTF-8表示为字节数组,然后将其转换为String。 eg
例如
byte[] bytes = new byte[]{
0x6c, 0x69, 'b', '/', 0x62, 0x2f, 0x6d, 0x69, 'n', 'd',
'/', 'm', 0x61, 'x', 0x2e, 0x70, 'h', 0x70};
String str = new String(bytes, "UTF-8");
(Again, assuming no typing errors.) (再次,假设没有键入错误。)
If you have the characters in a file to be read, you can use InputStreamReader to convert from whatever charset the string is in to a sequence of char
: 如果文件中有要读取的字符,则可以使用InputStreamReader将字符串所在的任何字符集转换为
char
序列:
InputStream is = ...; // get the input stream however you want
InputStreamReader isr = new InputStreamReader(is, "charset-name");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.