简体   繁体   中英

How to convert Telugu Characters into UTF-8 encoded characters in Java?

I have input character like this ఈ. For this character I need equivalent Hex entity "0C08" like this. Is there any inbuilt function in java for this?

Thanks in advance.

Characters in java are kept in unicode. So we need to specify encoding when reading/writing from/to external byte stream.

Note this code should print two the same lines on UTF-8 console:

String value = columnDetails.getColumnName();
System.out.println(value); //output with default encoding
System.out.write(value.getBytes("UTF-8"));//output with UTF-8

Edit: If you want hex representation of UTF-8 encoding, then try this:

//not optimized
String toHex(byte[] b) {
  String s="";
  for (int i=0; i<b.length; ++i) s+=String.format("%02X",b[i]&0xff);
  return s;
}
System.out.println(toHex("ఈ".getBytes("UTF-8"))); //prints E0B088

Edit2: or if you want Unicode (two byte representation)

static String toHex(String b) {
String s="";
for (int i=0; i<b.length(); ++i) s+=String.format("%04X",b.charAt(i)&0xffff);
    return s;
}
System.out.println(toHex("ఈ")); //prints 0C08

Java strings are UTF-16. To get UTF-8, you write something like:

String string = "SomethingInTeluguOrwhatever";
byte[] utf8Bytes = string.getBytes(Charsets.forName("utf-8"));

That gets you the UTF-8 values. If you want hex, iterate the bytes and print them in hex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM