简体   繁体   中英

How to edit invalid UTF-8 strings in mysql database

I have some utf-8 strings in my database, they are stored as varbinary. (Generally, it's mediawiki database, but that's not important, i think). I found that some strings are not in a good shape, then i make

SELECT  log_comment, CONVERT( log_comment
USING utf8 ) AS 
COMMENT 
FROM  `logging` 
WHERE log_id = %somevalue%

i have output table in phpmyadmin like this:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| d093d09ed0a1d0a220d0a020d098d0a1d09e2fd09cd0add09a20393239342d39332e20c2abd098d0bdd184d0bed180d0bcd0b0d186d0b8d0bed0bdd0bdd0b0d18f20d182d0b5d185d0bdd0bed0bbd0bed0b3d0b8d18f2e2e2e    |NULL |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What i need is to make this string readible, or upload new string with correct data. But this is varbinary field, how can i manage data inside it?

UPD: found that phpmyadmin automatically added 2e2e2e for three dots at the end of each line - they were too long to show. Original binary data are, if somebody interested,

d09fd0a02035302e312e3031392d3230303020d09ed181d0bdd0bed0b2d0bdd18bd0b520d0bfd0bed0bbd0bed0b6d0b5d0bdd0b8d18f20d0b5d0b4d0b8d0bdd0bed0b920d181d0b8d181d182d0b5d0bcd18b20d0bad0bbd0b0d181d181d0b8d184d0b8d0bad0b0d186d0b8d0b820d0b820d0bad0bed0b4d0b8d180d0bed0b2d0b0d0bdd0b8d18f20d182d0b5d185d0bdd0b8d0bad0be2dd18dd0bad0bed0bdd0bed0bcd0b8d187d0b5d181d0bad0bed0b920d0b820d181d0bed186d0b8d0b0d0bbd18cd0bdd0bed0b920d0b8d0bdd184d0bed180d0bcd0b0d186d0b8d0b820d0b820d183d0bdd0b8d184d0b8d186d0b8d180d0bed0b2d0b0d0bdd0bdd18bd1

anyway those strings contains non-utf symbols at the line end, as it seems from

SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM  `logging` 
WHERE log_id = %somevalue%

because last symbol is - for me it seems as black rhomb with white question in it, and last 20-30 characters are missing

SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM  `logging` 
WHERE log_id = %somevalue%

As it was said in Joni's comment,

"The length of the text is exactly 255 bytes, which is the limit of a MySQL tinytext/tinyblob field, and also often used by programmers as the size for varchar/varbinary. It looks like your original data has been clipped. The last D1 in your original data starts a new UTF-8 character, but the second byte is missing; that's why the last character is broken in the converted text."

In the MediaWiki DB in the field [log_comment] of the table [logging] should be stored headers of pages that was altered. Some of them appeared to be longer than 255 symbols, so while being logged they were clipped. That confused me; I thought that there was kind of database error, so i should just alter those strings - add to them missing symbols. Now i see it is slightly possible, so i just can gather necessary information from other fields.

try this:

SELECT  log_comment, 
        CONVERT(log_comment,VARCHAR(65535)) AS COMMENT 
FROM  `logging` 
WHERE log_id = %somevalue%

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM