简体   繁体   English

Java POST数据到mySQL UTF-8编码问题

[英]Java POST data to mySQL UTF-8 encoding issue

I have POST data that contains the Japanese string AKB48 ネ申テレビ シーズン3 , defined in jQuery as data . 我有POST数据,其中包含日语字符串AKB48 ネ申テレビ シーズン3 ,在jQuery中定义为data

$("#some_div").load("someurl", { data : "AKB48 ネ申テレビ シーズン3"}) 

The post data is sent to Java Servlet: 发布数据发送到Java Servlet:

String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");

My program saves it to MySQL, but after the data is saved to the database it becomes: 我的程序将其保存到MySQL,但是将数据保存到数据库后,它变为:

AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33 AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33

What should I do if I want to save it as it is in UTF-8? 如果我想以UTF-8保存它,我该怎么办? All my files are in UTF-8. 我的所有文件都是UTF-8。

MySQL encoding is utf8 and here is the code MySQL编码是utf8,这是代码

String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES ("
                + inventory.getUid() + ",'"
                + inventory.getItemId() + "','"
                + StringEscapeUtils.escapeJava(inventory.getItemData()) + "',CURRENT_TIMESTAMP)";
    Statement stmt = con.createStatement();
    int cnt = stmt.executeUpdate(sql);

From your example above, I can verify that the Japanese string is getting saved to your MySQL database correctly, but as escaped Unicode . 从上面的示例中,我可以验证日文字符串是否已正确保存到MySQL数据库,但已转义为Unicode

I would check these items in order: 我会按顺序检查这些项目:

  1. Are your tables and columns all set to have a character set and collation for utf8? 您的表和列是否都已设置为具有字符集和utf8的排序规则? Ie, CHARACTER SET utf8 COLLATE utf8_general_ci 即, CHARACTER SET utf8 COLLATE utf8_general_ci
  2. Are explicitly setting the character set encoding before POST? 是否在POST之前显式设置字符集编码? request.setCharacterEncoding("UTF-8");
  3. Are you setting the character encoding for your db connections? 您是否为数据库连接设置了字符编码? Ie, jdbc:mysql://localhost:3306/YOURDB?useUnicode=true&characterEncoding=UTF8 即, jdbc:mysql://localhost:3306/YOURDB?useUnicode=true&characterEncoding=UTF8

As the others have pointed out, you should not use that getBytes trick. 正如其他人指出的那样,您不应使用该getBytes技巧。 It will surely mess up the POSTed values. 肯定会弄乱POSTed值。

EDIT 编辑

Do not use StringEscapeUtils.escapeJava , since that will turn your string into escaped Unicode. 不要使用StringEscapeUtils.escapeJava ,因为那样会将您的字符串转换为转义的Unicode。 That is what is transforming AKB48 ネ申テレビ シーズン3 into AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33 . 这就是将AKB48 ネ申テレビ シーズン3改造成AKB48 u30CDu7533u30C6u30ECu30D3 u30B7u30FCu30BAu30F33

Why you do not just extract value of parameter like this.request.getParameter("data") ? 为什么不只提取像this.request.getParameter("data")这样的参数值?

Your data is sent correctly using URL encoding where each unicode character is replaced by its code. 您的数据使用URL编码正确发送,其中每个unicode字符由其代码替换。 Then you have to get the value of the parameter. 然后你必须得到参数的值。 When you are requesting bytes using ISO-8859-1 you are actually corrupting your data because the string is represented as a sequence if codes in textual form. 当您使用ISO-8859-1请求字节时,实际上是在破坏数据,因为如果是文本形式的代码,则字符串表示为序列。

What's the point of the line 这条线的重点是什么

String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");

You're transforming chinese (or at least non-occidental) characters into bytes using the ISO-8859-1 encoding. 您正在使用ISO-8859-1编码将中文(或至少是非偶然的)字符转换为字节。 Of course this can't work, since chinese characters are not supported by the ISO-8859-1 encoding. 当然这是行不通的,因为ISO-8859-1编码不支持中文字符。 ANd then you're constructing a new String from bytes that are supposed to represent ISO-8859-1-encoded characters, using the UTF-8 encoding. 然后,您将使用UTF-8编码从字节构造一个新的String,该字符串应该代表ISO-8859-1编码的字符。 This, once again, doesn't make any sense. 再一次,这没有任何意义。 UTF-8 and ISO-8859-1 are not the same thing, and only a small set of chars have the same encoding in both formats. UTF-8和ISO-8859-1不是一回事,只有少量字符在两种格式中具有相同的编码。

Just use 只是用

String data = this.request.getParameter("data");

and everything should be OK, provided that the column in the MySQL table uses an encoding that supports these characters. 一切都应该没问题,前提是MySQL表中的列使用支持这些字符的编码。

EDIT: 编辑:

now that you've shown us the code used to insert the data in database, I know where all this comes from (the preceding points are still valid, though). 既然你已经向我们展示了用于在数据库中插入数据的代码,我知道所有这些来自哪里(前面的点仍然有效)。 You're doing 你在做

StringEscapeUtils.escapeJava(inventory.getItemData())

What's the point? 重点是什么? escapeJava is used to take a String and escape special characters in order to make it a valid Java String literal. escapeJava用于获取String并转义特殊字符,以使其成为有效的Java String文字。 It has nothing to do with SQL. 它与SQL无关。 Use a prepared statement: 使用准备好的声明:

String sql = "INSERT INTO Inventory (uid, item_id, item_data, ctime) VALUES (?, ?, ?, CURRENT_TIMESTAMP);
PreparedStatement stmt = con.prepareStatement();
stmt.setInteger(1, inventory.getUid()); // or setLong, depending on the type
stmt.setString(2, inventory.getItemId());
stmt.setString(inventory.getItemData());
int cnt = stmt.executeUpdate();

The PreparedStatement will take care of escaping special SQL characters correctly. PreparedStatement将负责正确转义特殊的SQL字符。 They're the best tool agains SQL injection attack, and should always be used when a query has parameters, especially if the parameters come from the end user. 它们是SQL注入攻击的最佳工具,并且当查询具有参数时,尤其在参数来自最终用户的情况下,应始终使用它们。 See http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html . 请参阅http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

Java strings are stored in UTF-16. Java字符串以UTF-16存储。 So, this code: 因此,此代码:

String data = new String(this.request.getParameter("data").getBytes("ISO-8859-1"), "UTF-8");

decodes a UTF-16 string (which has been re-encoded from UTF-8 in the HTTP protocol) into a binary array using the ISO-8859-1 charset, and re-encodes the binary array using the UTF-8 charset. 使用ISO-8859-1字符集将UTF-16字符串(已在HTTP协议中从UTF-8重新编码)解码为二进制数组,并使用UTF-8字符集重新编码二进制数组。 This is almost certainly not what you want. 这几乎肯定不是你想要的。

What happens when you use this? 使用它时会发生什么?

String data = this.request.getParameter("data");
System.out.println(data);

If the second line generates bad data, then your problem is likely in jQuery. 如果第二行生成错误的数据,则您的问题很可能在jQuery中。 Determine that you are indeed getting unicode in your jQuery request: 确定您确实在jQuery请求中获取了Unicode:

System.out.println(this.request.getHeader("Content-Encoding"));

If it does not generate bad data, but the data doesn't get stored correctly in mySQL, your problem is at the database level. 如果它不会生成错误数据,但数据无法在mySQL中正确存储,则问题出在数据库级别。 Make sure your column type supports unicode strings. 确保您的列类型支持unicode字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM