简体   繁体   English

如何在Java中将unicode字符串转换为土耳其语?

[英]How do I convert a unicode string to turkish in java?

Hi, I want to convert the unicode value "\₺" to the Turkish equivalent string. 嗨,我想将Unicode值“ \\ u20BA”转换为土耳其语等效字符串。 Can anybody help me please? 有人可以帮我吗?

I used the following code: 我使用以下代码:

try {
  String string = "\u20BA";
  System.out.println(string + " " + string.toLowerCase());
  // Locale.setDefault(new Locale("tr"));
  // Locale tr = new Locale("TR","tr");
  byte[] converttoBytes = string.toLowerCase().getBytes("UTF-8");
  string = new String(converttoBytes, "Cp1254");
  System.out.println(string + " " + string.toLowerCase());
} catch (Exception e) {
 e.printStackTrace();
}

Think of a String in Java as a sequence of characters independent of any character encoding. 将Java中的String视为独立于任何字符编码的字符序列。 It therefore does not make sense to speak about changing the encoding of a String . 因此,谈论更改String的编码没有任何意义。

Character encodings only come to play if you convert between characters and bytes. 字符编码仅在您在字符和字节之间转换时才起作用。 This usually happens when you read or write characters from/to a Stream of bytes (for example a file). 当您在字节Stream (例如文件)中读写字符时,通常会发生这种情况。 If you don't specify the encoding explicitly the platform encoding gets used. 如果未明确指定编码,则会使用平台编码。

In case of difficulties make sure your platform encoding is set correctly or specify the correct encoding explicitly. 万一遇到困难,请确保正确设置平台编码或明确指定正确的编码。

The key is that you're specifying the code point for an individual character, but you're using that code point as the input to a String object, so Java's interpreting it as 6 separate characters. 关键是您要为单个字符指定代码点,但是您要将该代码点用作String对象的输入,因此Java会将其解释为6个单独的字符。 Try this for your specific question: 针对您的特定问题尝试以下操作:

StringBuilder sb = new StringBuilder();
sb.append('\u20BA');
System.out.println(sb.toString());

Note that the Unicode value is in single quotes - a single character value. 请注意,Unicode值用单引号引起来-一个字符值。 As you may have guessed, you can continue appending other Unicode values in this way to create a string...however, as has been mentioned, this might not be the best answer to whatever underlying problem you're working on. 您可能已经猜到了,您可以继续以这种方式附加其他Unicode值来创建字符串...但是,正如已经提到的那样,这可能并不是您正在处理的任何潜在问题的最佳答案。

The lira sign (u+20BA) was created in 2012 and both CP1254 and ISO-8859-9 character set doesn't have the lira sign included. 里拉符号 (u + 20BA)于2012年创建, CP1254ISO-8859-9字符集均未包含里拉符号。

This can be proven on Linux with the following set of commands (u+20BA is actually encoded as the 3 following bytes in utf8: E2 82 BA): 这可以在Linux上使用以下命令集来证明(u + 20BA实际上被编码为utf8中的以下3个字节:E2 82 BA):

$ echo -e "\xE2\x82\xBA"
₺
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to cp1254
iconv: illegal input sequence at position 0
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to iso88599
iconv: illegal input sequence at position 0
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to cp1254//TRANSLIT
?
$ echo -e "\xE2\x82\xBA" | iconv --from utf8 --to iso88599//TRANSLIT
?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM