简体   繁体   English

java tomcat utf-8编码问题

[英]java tomcat utf-8 encoding issue

I am developing a simple web application using java/jsp/tomcat/mysql, and the most problem lies on the character encoding because I need to deal with UTF-8 encoding instead of the default 8851. 我正在使用java / jsp / tomcat / mysql开发一个简单的Web应用程序,最大的问题在于字符编码,因为我需要处理UTF-8编码而不是默认的8851。

First of I'd like to describe my program structure. 首先,我想描述一下我的程序结构。 I am using a Servlet called Controller.java to handle all request. 我正在使用一个名为Controller.java的Servlet来处理所有请求。 So in web.xml, I have a Controller servlet which takes all request from *.do. 所以在web.xml中,我有一个Controller servlet,它从* .do获取所有请求。

Then this Controller will dispatch the request based on the requested URL, for example, if client asks for register.do, Controller will dispatch the request to Register.java. 然后,此Controller将根据请求的URL分派请求,例如,如果客户端要求register.do,Controller将把请求分派给Register.java。

And in the Register.java, there is a method which takes the request as parameter, namely: 在Register.java中,有一个方法将请求作为参数,即:

public String perform(HttpServletRequest request) {
    do something with the request...
}

So the problem is if I want to print something in UTF-8 inside this method, it will give random characters. 所以问题是如果我想在这个方法中用UTF-8打印一些东西,它会给出随机字符。 For example, I have an Enum which stores several constants, one of the properties the Enum has is its name in Traditional Chinese. 例如,我有一个存储几个常量的枚举,其中一个属性是Enum在繁体中文中的名称。 If I print it in 如果我打印出来的话

public static void main(Stirng[] args{
    System.out.println(MyEnum.One.getChn());
    logger.info(MyEnum.One.getChn());
}

This is printed correctly in Chinese. 这是用中文正确打印的。 However, if I put the exact code inside the method dealing with HttpServletRequest: 但是,如果我将确切的代码放在处理HttpServletRequest的方法中:

public String perform(HttpServletRequest request) {
    System.out.println(MyEnum.One.getChn());
    logger.info(MyEnum.One.getChn());
}

They are printed as random characters, but I can see from the debug window (eclipse) that the variables are holding correct Chinese characters. 它们被打印为随机字符,但我可以从调试窗口(eclipse)看到变量保存正确的中文字符。

So, the same situation happens when I want to store the value from request.getParameter(). 因此,当我想存储request.getParameter()中的值时,会发生同样的情况。 In the debug window, I can see the variable is holding correct characters, but one I print it out or try to store it in the database, it is random characters. 在调试窗口中,我可以看到变量保存了正确的字符,但我将其打印出来或尝试将其存储在数据库中,它是随机字符。

I don't know why the behavior acts like this, and this is blocking me from reading submitted form values and store them into database. 我不知道为什么行为会像这样,这阻止我阅读提交的表单值并将它们存储到数据库中。 Could someone give some hints on this? 有人可以给出一些暗示吗?

Great thanks. 十分感谢。

Here is a small tutorial what you need to do to make UTF-8 work in your web application: 这是一个小教程,您需要做什么才能使UTF-8在您的Web应用程序中工作:

You have to implement Filter in your application for character encoding: 您必须在应用程序中实现Filter以进行字符编码:

public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig)
            throws ServletException {

    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {
        servletRequest.setCharacterEncoding("UTF-8");
        servletResponse.setContentType("text/html; charset=UTF-8");
        filterChain.doFilter(servletRequest, servletResponse);
    }

    @Override
    public void destroy() {

    }
}

You have to make sure that your tomcat's server.xml's file connector element has URIEncoding attribute which value is UTF-8. 您必须确保tomcat的server.xml的文件连接器元素具有URIEncoding属性,该属性的值为UTF-8。

<Connector port="8080" 
           protocol="HTTP/1.1"
           connectionTimeout="20000"
           URIEncoding="UTF-8"
           redirectPort="8443"/>

Also you need to specify this in every JSP page: 您还需要在每个JSP页面中指定它:

<%@page contentType="text/html" pageEncoding="UTF-8"%>

If you need to use UTF-8 encoding (and really, everybody should be going this these days), then you can follow the "UTF-8 everywhere HOWTO" found in the Tomcat FAQ: 如果你需要使用UTF-8编码(事实上,现在每个人都应该这样做),那么你可以按照Tomcat常见问题解答中的“UTF-8无处不在”进行操作:

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8 http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Remember that you also need to support UTF-8 in your database's text fields. 请记住,您还需要在数据库的文本字段中支持UTF-8。

Also remember that sometimes "printing" a String with non-ASCII characters in it to a log file or the console can be affected by 还要记住,有时会将带有非ASCII字符的字符串“打印”到日志文件中,否则控制台会受到影响

  1. The character encoding of the output stream 输出流的字符编码
  2. The character encoding of the file reader (eg cat/less/vi) 文件阅读器的字符编码(例如cat / less / vi)
  3. The character encoding of the terminal 终端的字符编码

You might be better off writing the values to a file and then using a hex editor to examine the contents to be sure that you are getting the byte values you are looking for. 您可能最好将值写入文件,然后使用十六进制编辑器检查内容,以确保获得所需的字节值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM