简体   繁体   English

如何确保所有进出数据库的数据都是utf-8编码的?

[英]How to ensure all data going in and out of a database is utf-8 encoded?

I just learned about character sets today, so forgive the newb factor if this is confusing. 我今天刚刚学习了字符集,因此如果感到困惑,请原谅newb因素。 Please ask for clarification if it's needed. 请要求是否需要澄清。

I wrote a program in php which recursively goes through the files in a folder and stores the file names in a database. 我用php编写了一个程序,该程序以递归方式遍历文件夹中的文件并将文件名存储在数据库中。 The file names are then all exported from the database in json format using the json_encode($array) function. 然后,使用json_encode($array)函数以json格式从数据库中导出所有文件名。

However this function only works with UTF-8 encoded data. 但是,此功能仅适用于UTF-8编码的数据。 And since a few of the key-value pairs in the json export have the value of null , I'm lead to believe that those strings of filenames taken from the database are in fact not utf-8. 并且由于json导出中的一些键值对的值为null ,因此我认为从数据库中获取的文件名字符串实际上不是utf-8。

I've ensured that all the data going in and out of the the database is utf-8 by setting the defaults to utf-8 in my.cnf and restarting mysql from the command line using service mysql restart 我通过将my.cnf的默认值设置为utf-8并使用service mysql restart从命令行重启mysql,确保所有进出数据库的数据都是utf-8

[client]
default-character-set=utf8

[mysqld]
default-character-set = utf8

I then created my database, the table and all the columns in the table and confirmed that the database, table and all the columns are in fact utf-8 然后,我创建了数据库,表和表中的所有列,并确认数据库,表和所有列实际上都是utf-8

Checks if database is utf-8 检查数据库是否为utf-8

SELECT default_character_set_name FROM information_schema.SCHEMATA S
WHERE schema_name = "schemaname";

Checks if table is utf-8 检查表是否为utf-8

SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "schemaname"
  AND T.table_name = "tablename";

Checks if field is utf-8 检查字段是否为utf-8

SELECT character_set_name FROM information_schema.`COLUMNS` C
WHERE table_schema = "schemaname"
  AND table_name = "tablename"
  AND column_name = "columnname";

There's this file that has the characters –µ–ª–∫—É–Ω—á–∏–∫ in the file name. 该文件的文件名中包含字符–µ–ª–∫—É–Ω–á–∏–∫。 When it's stored in the database the values appear as –©–µ–ª–â'. 当将其存储在数据库中时,这些值将显示为“ µ”,“â”,“ª”。

Per my database settings, are all the strings going in and out of my database utf-8? 根据我的数据库设置,是否所有字符串都进出数据库utf-8?

What can I do to ensure the data I am SELECT'ing from the database is utf-8, so I can perform json_encode($array) ? 如何确保从数据库中选择的数据是utf-8,所以我可以执行json_encode($array) (NOTE: this function only works on utf-8 encoded data) (注意:此功能仅适用于utf-8编码的数据)

Unfortunately I don't know how you can ensure everything coming out is UTF-8 (now I'm curious too!), but a starting point would be trying this in your PHP: 不幸的是,我不知道如何确保所有输出都是UTF-8(现在我也很好奇!),但是起点是在PHP中尝试:

$encodedNames = array();
$errors = array();

// Loop through all of the filenames
foreach($filenames as $filename)
{
  // Check if it's UTF-8 encoded
  if('UTF-8' === mb_detect_encoding($filename, 'UTF-8', true))
  {
    $encodedNames[] = $filename;
  }
  else
  {
    $errors[] = $filename;
  }
}

// json_encode the UTF-8 filenames
$jsonString = json_encode($encodedNames);

// Log the other filenames here so you can deal with them later...

http://php.net/manual/en/function.mb-detect-encoding.php http://php.net/manual/zh/function.mb-detect-encoding.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM