简体   繁体   English

Java URLDecoder特殊字符和UTF-8

[英]Java URLDecoder special characters and UTF-8

Take the string Mediæval%20Bæbes. 取字符串Mediæval%20Bæbes。 It can be encoded in the URL as either Medi%E6val+B%E6bes Mediæval%20Bæbes. 可以在URL中将其编码为Medi%E6val + B%E6besMediæval%20Bæbes。 On the first I get the correct æ character when decoded. 首先,我在解码时得到正确的æ字符。 The latter gives me (the replacement character). 后者给我-(替换字符)。 I can't figure out how to get Java to decode it both ways, possibly in the same URL. 我无法弄清楚如何使Java双向解码(可能在同一个URL中)。 I tried java.net.URI and apache's URLCodec as well. 我也尝试了java.net.URI和apache的URLCodec。

Thanks 谢谢

You will never find a solution to this puzzle because these two strings are in two different encodings. 您将永远找不到解决此难题的方法,因为这两个字符串采用两种不同的编码。 UTF-8 for æ is %C3%A6, %E6 is ISO-8859-1. æ UTF-8是%C3%A6,%E6是ISO-8859-1。 It can only work like this 它只能这样工作

String s1 = URLDecoder.decode("Medi%E6val+B%E6bes", "ISO-8859-1");
String s2 = URLDecoder.decode("Mediæval%20Bæbes", "UTF-8");
String s3 = URLDecoder.decode("Medi%C3%A6val%20B%C3%A6bes", "UTF-8");

The first one is the safe encoding: the second is not, and therefore does not work. 第一个是安全编码:第二个不是,因此不起作用。

See this 看到这个

Edit: better reference 编辑:更好的参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM