简体   繁体   中英

String unicode decimal codes convert whole

I have a string like 4,0 — 10,04,0 — 10,0 I need to decode it to: 4,0 — 10,0 this code can be checked in https://www.codetable.net/decimal/151

I tried Apache's StringEscapeUtils.unescapeJava without any luck.

It is a numerical entity, common in HTML, XML, and their base, SGML.

Try apache's StringEscapeUtils.unescapeHTML* . This will also take care of named entities like — .

Or do it yourself:

Pattern entityPattern = Pattern.compile("\\&#(\\d+);");
String s = "4,0 — 10,0";
s = entityPattern.matcher(s).replaceAll(mr
        -> new String(int[] {Integer.parseInt(mr.group(1))}, 0, 1);

This does create a string with one Unicode code point of 151. For hexadecimal numeric entities:

Pattern entityPattern = Pattern.compile ("\\&#x([\\da-f]+);",
        Pattern.CASE_INSENSITIVE);
String s = "4,0 — 10,0";
s = entityPattern.matcher(s).replaceAll(mr
        -> new String(int[] {Integer.parseInt(mr.group(1), 16)}, 0, 1);

If you got this string from an HTML form when the user entered/pasted special characters, you forgot in the form:

<form action="..." accept-charset="UTF-8">

Without this, special characters are converted to numeric entities.

This assumes that the web server already uses UTF-8 for its pages.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM