[英]Convert unicode value to string in java
I am trying to extract currencies in my texts and I am getting currencies from db which contains special currency symbols as well. 我正在尝试提取文本中的货币,并且从包含特殊货币符号的db中获取货币。 For example for the pound, I have unicode of pound "\£" in the db along with other identifiers such as "gbp" as well.
例如,对于磅,我在数据库中具有磅“ \\ u00A3”的unicode以及其他标识符,例如“ gbp”。
I am trying to get the corresponding symbol from the unicode and compare with my text in a loop as suggested in here . 我试图从unicode中获取相应的符号,并按照此处的建议在循环中与我的文本进行比较。
But when I evaluate my code, the result is like in the image here: 但是,当我评估我的代码时,结果如下面的图片所示:
private Optional<Currency> extractTokenWise(Iterable<String> tokens){
try {
for (String aToken : tokens) {
for (String currency : currencies.keySet()) {
for (String arep : currencies.get(currency)) {
if(arep.startsWith("\\")){ //special character for currency written in unicode representation
byte[] charset = arep.getBytes("UTF-8");
arep = new String(charset, "UTF-8");
}
if (aToken.equals(arep)) {
return Optional.of(Currency.findProperEnum(currency));
}
}
}
}
}catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return Optional.empty();
}
It is interesting that when arep
is equal to "\£"
, it does not work but when I specifically give String
value of "\£"
, It produces the result I want. 有趣的是,当
arep
等于"\£"
,它不起作用,但是当我专门给出String
值"\£"
,它将产生我想要的结果。 What am I missing here? 我在这里想念什么?
As mentioned in comments something like this should work: 如评论中所述,这样的方法应该起作用:
if (arep.startsWith("\\u")) {
arep = Character.toString((char) Integer.parseInt(arep.substring(2), 16));
}
I think you mix up unicode escape sequences in java code with strings containing such escape sequences. 我认为您将Java代码中的unicode 转义序列与包含此类转义序列的字符串混合在一起。
String poundSign = "\£";
assigns poundSign
a string containing the single character £ . 为
poundSign
分配一个包含单个字符£的字符串。 This string has a length of 1 character. 该字符串的长度为1个字符。 In memory and in the class file it will occupy 2 bytes.
在内存和类文件中,它将占用2个字节。
It looks like arep
contains the string \£
as assigned by String unicodeEscapeForPoundSign = "\\\£";
它看起来像
arep
包含字符串\£
通过指定String unicodeEscapeForPoundSign = "\\\£";
-- that's what your first if statement tests for. -这就是您的第一个if语句要测试的内容。 It contains the unicode escape sequence as used in java code, but not the character this escape sequence represents .
它包含Java代码中使用的unicode转义序列 ,但不包含此转义序列表示的字符。 It contains the 6 characters '\\', 'u', '0', '0', 'A', and '3' (as your IDE shows).
它包含6个字符“ \\”,“ u”,“ 0”,“ 0”,“ A”和“ 3”(如您的IDE所示)。
arep.getBytes("UTF-8");
returns an array of just these characters and new String(charset, "UTF-8");
返回仅包含这些字符和
new String(charset, "UTF-8");
的数组new String(charset, "UTF-8");
converts the array back to the string \£
and not the string £
将数组转换回字符串
\£
而不是字符串£
The solution depends on what you get from your database . 解决方案取决于您从数据库中获得什么 。 Assuming you have a mapping from the db-value to a
Currency
object or the ISO currency code, you won't need your first if statement, just make sure arep
contains the correct string: 假设您具有从db-value到
Currency
对象或ISO货币代码的映射,则不需要第一个if语句,只需确保arep
包含正确的字符串即可:
String arep = "\£"
(single pound character) String arep = "\£"
(单英镑字符) String arep = "\\\£"
(pound character java unicode escape string) String arep = "\\\£"
(磅字符java unicode转义字符串)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.