简体   繁体   English

如何在 Java 中将泰卢固语字符转换为 UTF-8 编码的字符?

[英]How to convert Telugu Characters into UTF-8 encoded characters in Java?

I have input character like this ఈ.我有这样的输入字符ఈ。 For this character I need equivalent Hex entity "0C08" like this.对于这个角色,我需要像这样等效的十六进制实体“0C08”。 Is there any inbuilt function in java for this? java中是否有任何内置函数?

Thanks in advance.提前致谢。

Characters in java are kept in unicode. java中的字符保存在unicode中。 So we need to specify encoding when reading/writing from/to external byte stream.所以我们需要在从/向外部字节流读取/写入时指定编码。

Note this code should print two the same lines on UTF-8 console:请注意,此代码应在 UTF-8 控制台上打印两行相同的行:

String value = columnDetails.getColumnName();
System.out.println(value); //output with default encoding
System.out.write(value.getBytes("UTF-8"));//output with UTF-8

Edit: If you want hex representation of UTF-8 encoding, then try this:编辑:如果你想要 UTF-8 编码的十六进制表示,那么试试这个:

//not optimized
String toHex(byte[] b) {
  String s="";
  for (int i=0; i<b.length; ++i) s+=String.format("%02X",b[i]&0xff);
  return s;
}
System.out.println(toHex("ఈ".getBytes("UTF-8"))); //prints E0B088

Edit2: or if you want Unicode (two byte representation) Edit2:或者如果你想要 Unicode(两字节表示)

static String toHex(String b) {
String s="";
for (int i=0; i<b.length(); ++i) s+=String.format("%04X",b.charAt(i)&0xffff);
    return s;
}
System.out.println(toHex("ఈ")); //prints 0C08

Java strings are UTF-16. Java 字符串是 UTF-16。 To get UTF-8, you write something like:要获得 UTF-8,你可以这样写:

String string = "SomethingInTeluguOrwhatever";
byte[] utf8Bytes = string.getBytes(Charsets.forName("utf-8"));

That gets you the UTF-8 values.这为您提供了 UTF-8 值。 If you want hex, iterate the bytes and print them in hex.如果你想要十六进制,迭代字节并以十六进制打印它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM