简体   繁体   中英

MySQL check if BLOB is valid UTF-8

I have data in BLOB columns in a MySQL database which I suspect is entirely UTF-8 encoded text (and therefore better stored as TEXT ), but I would like to test this.

Is there a way I can check if a binary string is valid UTF-8 within SQL? So I can do something like:

SELECT SUM(IS_UTF8(col)) / SUM(1) as `percentUtf8`
FROM table

"Efficient" for you? Or for the computer? mb_check_encoding does it in a single read through the data--efficient for computer. But you need to write the code to identify all the blobs, and do selects, etc.--less efficient for you.

Well, here is a way to generate the tedious part:

> mysql ... information_schema > sql.inc
SELECT  CONCAT('Foo("', table_schema, '", "', table_name, '",
                        "', column_name, '");'
              )
    FROM  tables
    JOIN  columns USING (table_schema, table_name)
    WHERE  column_type LIKE '%BLOB'
      OR   column_type LIKE '%BINARY%';
exit;

For me, sql.inc contained something like

Foo("test", "07", "md5");
Foo("test", "jpg", "jpg");
Foo("test", "key2", "stuff");
Foo("test", "picsav", "thumb");
Foo("try", "bin16", "bin16");
Foo("try", "bin16", "bin32");
Foo("try", "blobs", "b");
Foo("try", "f521951", "blob_c");

Then write the PHP function Foo(db, tbl, col) subroutine to do the test and display whatever you like -- or act on the result. And do require "sql.inc";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM