简体   繁体   中英

How do I convert a unicode string to turkish in java?

Hi, I want to convert the unicode value "\₺" to the Turkish equivalent string. Can anybody help me please?

I used the following code:

try {
  String string = "\u20BA";
  System.out.println(string + " " + string.toLowerCase());
  // Locale.setDefault(new Locale("tr"));
  // Locale tr = new Locale("TR","tr");
  byte[] converttoBytes = string.toLowerCase().getBytes("UTF-8");
  string = new String(converttoBytes, "Cp1254");
  System.out.println(string + " " + string.toLowerCase());
} catch (Exception e) {
 e.printStackTrace();
}

Think of a String in Java as a sequence of characters independent of any character encoding. It therefore does not make sense to speak about changing the encoding of a String .

Character encodings only come to play if you convert between characters and bytes. This usually happens when you read or write characters from/to a Stream of bytes (for example a file). If you don't specify the encoding explicitly the platform encoding gets used.

In case of difficulties make sure your platform encoding is set correctly or specify the correct encoding explicitly.

The key is that you're specifying the code point for an individual character, but you're using that code point as the input to a String object, so Java's interpreting it as 6 separate characters. Try this for your specific question:

StringBuilder sb = new StringBuilder();
sb.append('\u20BA');
System.out.println(sb.toString());

Note that the Unicode value is in single quotes - a single character value. As you may have guessed, you can continue appending other Unicode values in this way to create a string...however, as has been mentioned, this might not be the best answer to whatever underlying problem you're working on.

The lira sign (u+20BA) was created in 2012 and both CP1254 and ISO-8859-9 character set doesn't have the lira sign included.

This can be proven on Linux with the following set of commands (u+20BA is actually encoded as the 3 following bytes in utf8: E2 82 BA):

$ echo -e "\xE2\x82\xBA"
₺
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to cp1254
iconv: illegal input sequence at position 0
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to iso88599
iconv: illegal input sequence at position 0
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to cp1254//TRANSLIT
?
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to iso88599//TRANSLIT
?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM