简体   繁体   中英

URL decoding Japanese characters etc. in Java

I have a servlet that receives some POST data. Because this data is x-www-form-urlencoded, a string such as サボテン would be encoded to サボテン.

How would I unencode this string back to the correct characters? I have tried using URLDecoder.decode("encoded string", "UTF-8"); but it doesn't make a difference.

The reason I would like to unencode them, is because, before I display this data on a webpage, I escape & to & and at the moment, it is escaping the &s in the encoded string so the characters are not showing up properly.

Those are not URL encodings . It would have looked like %E3%82%B5%E3%83%9C%E3%83%86%E3%83%B3 . Those are decimal HTML/XML entities . To unescape HTML/XML entities, use Apache Commons Lang StringEscapeUtils .


Update as per the comments: you will get question marks when the response encoding is not UTF-8. If you're using JSP, just add the following line to top of the page:

<%@ page pageEncoding="UTF-8" %>

See for more detail the solutions about halfway this article . I would prefer using-UTF8-all-the-way above fiddling with regexps since regexps doesn't prepare you for world domination.

This is a feature/bug of browsers. If a web page is in a limited charset, say ASCII, and users type in some chars outside the charset in a form field, browsers will send these chars in the form of $#xxxx;

It can be a problem because if users actually type $#xxxx; they'll be sent as is. So the server has no way to distinguish the two cases.

The best way is to use a charset that covers all characters, like UTF-8, so browsers won't do this trick.

Just a wild guess, but are you using Tomcat?

If so, make sure you have set up the Connector in Tomcat with a URIEncoding of UTF-8. Google that on the web and you will find a ton of hits such as

How to get UTF-8 working in Java webapps?

How about a regular expression?

Pattern pattern = Pattern.compile("&([^a][^m][^p][^;])?");
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll("&amp;$1");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM