简体   繁体   English

打印字符串文字unicode作为实际字符

[英]Print string literal unicode as the actual character

In my Java application I have been passed in a string that looks like this: 在我的Java应用程序中,我传递了一个如下所示的字符串:

"\¥123" “\\ u00a5123”

When printing that string into the console, I get the same string as the output (as expected). 将该字符串打印到控制台时,我得到与输出相同的字符串(如预期的那样)。

However, I want to print that out by having the unicode converted into the actual yen symbol (\¥ -> yen symbol) - how would I go about doing this? 但是,我想通过将unicode转换为实际的日元符号(\\ u00a5 - >日元符号)来打印出来 - 我将如何进行此操作?

ie so it looks like this: "[yen symbol]123" 即所以它看起来像这样:“[日元符号] 123”

I wrote a little program: 我写了一个小程序:

public static void main(String[] args) {
    System.out.println("\u00a5123");
}

It's output: 它的输出:

¥123 ¥123

ie it output exactly what you stated in your post. 即它输出的确与您在帖子中所述的内容完全相同。 I am not sure there is not something else going on. 我不确定是否还有其他事情发生。 What version of Java are you using? 您使用的是哪个版本的Java?

edit: 编辑:

In response to your clarification, there are a couple of different techniques. 为了回应您的澄清,有几种不同的技术。 The most straightforward is to look for a "\\u\u0026quot; followed by 4 hex-code characters, extract that piece and replace with a unicode version with the hexcode (using the Character class). 最简单的方法是查找“\\ u”后跟4个十六进制代码字符,提取该片段并用带有十六进制代码的unicode版本替换(使用Character类)。 This of course assumes the string will not have a \\u in front of it. 这当然假设字符串前面没有\\ u。

I am not aware of any particular system to parse the String as though it was an encoded Java String. 我不知道任何特定的系统来解析String,好像它是一个编码的Java字符串。

As has been mentioned before, these strings will have to be parsed to get the desired result. 如前所述,必须解析这些字符串以获得所需的结果。

  1. Tokenize the string by using \\u as separator. 使用\\ u作为分隔符对字符串进行标记。 For example: \接\受 => { "63A5", "53D7" } 例如: \接\受 => { "63A5", "53D7" }

  2. Process these strings as follows: 处理这些字符串如下:

     String hex = "63A5"; int intValue = Integer.parseInt(hex, 16); System.out.println((char)intValue); 

You're probably going to have to write a parse for these, unless you can find one in a third party library. 你可能不得不为这些写一个解析,除非你可以在第三方库中找到一个。 There is nothing in the JDK to parse these for you, I know because I fairly recently had an idea to use these kind of escapes as a way to smuggle unicode through a Latin-1-only database. JDK中没有任何内容可以解析这些内容,我知道,因为我最近有一个想法是使用这种类型的转义作为通过仅限Latin-1的数据库走私unicode的方法。 (I ended up doing something else btw) (我最后做了别的事btw)

I will tell you that java.util.Properties escapes and unescapes Unicode characters in this manner when reading and writing files (since the files have to be ASCII). 我会告诉你java.util.Properties在读取和写入文件时以这种方式转义和转换Unicode字符(因为文件必须是ASCII)。 The methods it uses for this are private, so you can't call them, but you could use the JDK source code to inspire your solution. 它用于此的方法是私有的,因此您无法调用它们,但您可以使用JDK源代码来激发您的解决方案。

Could replace the above with this: 可以用以下代替上面的内容:

System.out.println((char)0x63A5);

Here is the code to print all of the box building unicode characters. 这是打印所有盒子构建unicode字符的代码。

public static void printBox()
{
    for (int i=0x2500;i<=0x257F;i++)
    {
        System.out.printf("0x%x : %c\n",i,(char)i);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM