简体   繁体   中英

“Incorrect string value:” MySQL issue when inserting UTF8 text into a latin1 column

I have this MySQL table in production that is of charset latin1_swedish_ci ( aka latin1 ) .

Right now, there is this incoming content( String : "\한\밤\의" ) in a UTF-8 format that needs to be inserted into this TEXT column field called keywords in the table.

When I try to perform the INSERT, I get this error :

Incorrect string value: '\xED\x95\x9C\xEB\xB0\xA4...' for column 'keywords' at row 1

I have tried all kinds of ways in my Java code to try to convert from UTF8 to ISO-8859-1 like this below and I am still getting the same error :

String convertedString = new String(originalString.getBytes("UTF-8"), "ISO-8859-1");

I know there are solutions on StackOverflow that mentions to change the charset of the MySQL table to UTF8 from latin1, and I unfortunately cannot do that because this is a live production MySQL master server and also it has historically been using latin1.

Does anyone have any suggestions to fix this "Incorrect string value" error?

Thanks IS

What you're trying to do simply isn't possible, unless the characters in the utf8 string also happen to have representations in latin1... and latin1 is a tiny single-byte character set (fewer than 256 possible characters, total), so the vast majority of valid utf8 characters have no equivalent latin1 representation.

You can't store any character in the column that the character set of the column doesn't support. It's not a matter of "converting" from one to the other.

If you need unicode, you need at least a utf8 column, and modifying the table is the only alternative. Trying to do otherwise is like trying to store a negative number in an unsigned integer column. Unsigned ints can't be negative -- it's not a matter of conversion.

This would be true of any RDBMS that supports character data types, and is not a limitation specific to MySQL.

한밤 is the Mojibake for 한밤 -- that is where it got converted to latin1 at some stage. But \한\밤 is Unicode. What mode is Python in? Do you have this at the beginning?

# -*- coding: utf-8 -*- 

More Python checklist .

More

utf8 is preferred; euckr is possible. But... The problem is not in picking the character set, it is in being consistent throughout the application in specifying that character set.

Are you using Python? It is tagged Java?

For Java/JDBC, you need ?useUnicode=yes&characterEncoding=UTF-8 in the getConnection() call.

You need these:

  • The bytes in your client need to be utf8, such as hex ED959C . (Korean characters are all 3 bytes in utf8.)
  • The connection between the client and the server needs to be utf8. Performing SET NAMES utf8 right after connecting is another way to do that.
  • The column/table needs to be CHARACTER SET utf8 .
  • If you are using html, it will need <meta charset=UTF-8> .

For Korean, utf8mb4 is as good as utf8 . Check those 4 bullet items above, and 'prove' to us that you are doing all of them.

For JSP and Java Servlets, slightly different advice is warranted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM