简体   繁体   中英

UTF-8 decoding problems in Java & Tomcat7

I'm sending an AJAX request to the server, where the param value is encoded in the "escape(...)" function.

The Tomcat server (7.0.42) is configured st the receiving Connector has a URIEncoding="UTF-8", in web.xml I have configured the SetCharacterEncodingFilter as follows:

<filter>
    <filter-name>charencode</filter-name>
    <filter-class>
        org.apache.catalina.filters.SetCharacterEncodingFilter
    </filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>charencode</filter-name>
    <url-pattern>*</url-pattern>
</filter-mapping>

, and additionally I have created a filter to encode the response as UTF-8:

@Override
public void doFilter(ServletRequest arg0, ServletResponse arg1, FilterChain arg2) throws IOException, ServletException {
    arg1.setCharacterEncoding("UTF-8");
    arg2.doFilter(arg0, arg1);
}

There is no issue parsing params that come from the Latin charset, but when I tried Russian, request.getParameter(..) returns null. Additionally, I get this in the logs (suspect it's coming from the SetCharacterEncodingFilter):

INFO: Character decoding failed. Parameter [usersaid] with value [%u044B%u0432%u0430%u044B%u0432%u0430%u044B%u0432%u044B%u0432%u0430%u044B%u0432%u0430%21] has been ignored. Note that the name and value quoted here may be corrupted due to the failed decoding. Use debug level logging to see the original, non-corrupted values.

And there is no DEBUG-level messages to follow (my logger is set up right I believe..)

Could you please advise? Will be happy to answer questions!

Many thanks, Victor.

That string doesn't decode. Nothing to do with your application server. Try these tools to see for your self:

http://www.albionresearch.com/misc/urlencode.php http://meyerweb.com/eric/tools/dencoder/

So, the error looks like it might be client side. Make sure you set the encoding correctly when urlencoding. You are probably using something else that UTF-8, which is what you should use.

Here's a thread on correctly encoding unicode characters: What is the proper way to URL encode Unicode characters?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM