简体   繁体   English

从给定的字符串中获取Unicode编码字符(卡纳达语语言)

[英]Get Unicode Encoded Characters (Kannada Lanuguage) from given String

String s1="\u0048\u0065\u006C\u006C\u006F";   // Hello
String s2="\u0CAE\u0CC1\u0C96\u0CAA\u0CC1\u0C9F";  // ಮುಖಪುಟ (Kannada Language)

System.out.println("s1: " + StringEscapeUtils.unescapeJava(s1));  // s1: Hello
System.out.println("s2: " + StringEscapeUtils.unescapeJava(s2));  // s2: ??????

When I print s1 , I get the result as Hello . 当我打印s1 ,得到的结果为Hello When I print s2 , I get the result as ??????? 当我打印s2 ,得到的结果是??????? .

I want the output as ಮುಖಪುಟ for s2 . 我希望输出为s2 ಮುಖಪುಟ How can I achieve this? 我该如何实现?

 ByteArrayOutputStream os = new ByteArrayOutputStream();
 PrintStream ps = new PrintStream(os);
 ps.println("\u0048\u0065\u006C\u006C\u006F \u0CAE\u0CC1\u0C96\u0CAA\u0CC1\u0C9F");  
 String output = os.toString("UTF8");
 System.out.println("result: "+output);   //  Hello ಮುಖಪುಟ 

You need to add the encoding like "UTF-8" try this 您需要添加类似“ UTF-8”的编码,请尝试以下操作

String s1="\u0048\u0065\u006C\u006C\u006F";   // Hello
String s2="\u0CAE\u0CC1\u0C96\u0CAA\u0CC1\u0C9F";  // ಮುಖಪುಟ (Kannada Language)

System.out.println("s1: " + new String(s1.getBytes("UTF-8"), "UTF-8"));
System.out.println("s2: " + new String(s2.getBytes("UTF-8"), "UTF-8"));

If you are using Eclipse then please have a look at: https://decoding.wordpress.com/2010/03/18/eclipse-how-to-change-the-console-output-encoding/ 如果您使用的是Eclipse请查看: https : //decoding.wordpress.com/2010/03/18/eclipse-how-to-change-the-console-output-encoding/

Please simply output on the console as follows:- 请简单地在控制台上输出如下:

String s1="\u0048\u0065\u006C\u006C\u006F";   
String s2="\u0CAE\u0CC1\u0C96\u0CAA\u0CC1\u0C9F";
System.out.println("s1: " + s1);  // s1
System.out.println("s2: " + s2);  // s2

Hope, this is helpful to you. 希望对您有帮助。

The problem is most probably that System.out is not prepared to deal with Unicode. 问题很可能是System.out不准备处理Unicode。 It is an output stream that gets encoded in the so called default encoding . 它是一种输出流,该流以所谓的默认编码进行编码

The default encoding is most often (ie on Windows) some proprietary 8-bit character set, that simply can't handle unicode. 默认编码通常是(例如,在Windows上)一些专有的8位字符集,这些字符集根本无法处理unicode。

My tip: For the sake of testing, create your own PrintStream or PrintWriter with UTF-8 encoding. 我的提示:为了进行测试,请使用UTF-8编码创建自己的PrintStream或PrintWriter。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM