简体   繁体   English

在Java中将Unicode值转换为字符串

[英]Convert unicode value to string in java

I am trying to extract currencies in my texts and I am getting currencies from db which contains special currency symbols as well. 我正在尝试提取文本中的货币,并且从包含特殊货币符号的db中获取货币。 For example for the pound, I have unicode of pound "\£" in the db along with other identifiers such as "gbp" as well. 例如,对于磅,我在数据库中具有磅“ \\ u00A3”的unicode以及其他标识符,例如“ gbp”。

I am trying to get the corresponding symbol from the unicode and compare with my text in a loop as suggested in here . 我试图从unicode中获取相应的符号,并按照此处的建议在循环中与我的文本进行比较。

But when I evaluate my code, the result is like in the image here: 但是,当我评估我的代码时,结果如下面的图片所示: 结果

private Optional<Currency> extractTokenWise(Iterable<String> tokens){
    try {
        for (String aToken : tokens) {
            for (String currency : currencies.keySet()) {
                for (String arep : currencies.get(currency)) {
                    if(arep.startsWith("\\")){ //special character for currency written in unicode representation                  
                        byte[] charset = arep.getBytes("UTF-8");
                        arep = new String(charset, "UTF-8");
                    }
                    if (aToken.equals(arep)) {
                        return Optional.of(Currency.findProperEnum(currency));
                    }
                }
            }
        }
    }catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return Optional.empty();
}

It is interesting that when arep is equal to "\£" , it does not work but when I specifically give String value of "\£" , It produces the result I want. 有趣的是,当arep等于"\£" ,它不起作用,但是当我专门给出String"\£" ,它将产生我想要的结果。 What am I missing here? 我在这里想念什么?

As mentioned in comments something like this should work: 如评论中所述,这样的方法应该起作用:

if (arep.startsWith("\\u")) {
        arep = Character.toString((char) Integer.parseInt(arep.substring(2), 16));
}

I think you mix up unicode escape sequences in java code with strings containing such escape sequences. 我认为您将Java代码中的unicode 转义序列包含此类转义序列的字符串混合在一起。

String poundSign = "\£"; assigns poundSign a string containing the single character £ . poundSign分配一个包含单个字符£的字符串。 This string has a length of 1 character. 该字符串的长度为1个字符。 In memory and in the class file it will occupy 2 bytes. 在内存和类文件中,它将占用2个字节。

It looks like arep contains the string as assigned by String unicodeEscapeForPoundSign = "\\\£"; 它看起来像arep包含字符串通过指定String unicodeEscapeForPoundSign = "\\\£"; -- that's what your first if statement tests for. -这就是您的第一个if语句要测试的内容。 It contains the unicode escape sequence as used in java code, but not the character this escape sequence represents . 它包含Java代码中使用的unicode转义序列 ,但不包含此转义序列表示的字符。 It contains the 6 characters '\\', 'u', '0', '0', 'A', and '3' (as your IDE shows). 它包含6个字符“ \\”,“ u”,“ 0”,“ 0”,“ A”和“ 3”(如您的IDE所示)。 arep.getBytes("UTF-8"); returns an array of just these characters and new String(charset, "UTF-8"); 返回仅包含这些字符和new String(charset, "UTF-8");的数组new String(charset, "UTF-8"); converts the array back to the string and not the string £ 将数组转换回字符串不是字符串£

The solution depends on what you get from your database . 解决方案取决于您从数据库中获得什么 Assuming you have a mapping from the db-value to a Currency object or the ISO currency code, you won't need your first if statement, just make sure arep contains the correct string: 假设您具有从db-value到Currency对象或ISO货币代码的映射,则不需要第一个if语句,只需确保arep包含正确的字符串即可:

  • String arep = "\£" (single pound character) String arep = "\£" (单英镑字符)
  • String arep = "\\\£" (pound character java unicode escape string) String arep = "\\\£" (磅字符java unicode转义字符串)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM