简体   繁体   English

来自 MySQL 数据库的 UTF-8 字符串的 PHP 输出编码问题

[英]PHP output encoding issues with UTF-8 strings from MySQL databases

I know this question comes up in one form or another all the time on here, but I'm kind of at a loss on how to resolve it.我知道这个问题在这里一直以一种或另一种形式出现,但我对如何解决它有点不知所措。 I've got a PHP website that's running off of MySQL, that's showing some extended characters as a garbled mess.我有一个运行 MySQL 的 PHP 网站,将一些扩展字符显示为乱码。 As far as I know it's all encoded as UTF-8, on every step from the content import to displaying it on the screen.据我所知,从内容导入到在屏幕上显示的每一步都被编码为 UTF-8。 Still, it's showing weird encoding issues.尽管如此,它仍然显示出奇怪的编码问题。 Here's the first test example (Natural Phënåm¥na, this is on purpose), which mb_detect_encoding identifies as UTF-8, which I can only get to display correctly with utf8_decode :这是第一个测试示例(Natural Phënåm¥na,这是故意的),其中mb_detect_encoding标识为 UTF-8,我只能使用utf8_decode正确显示:

no utf8_decode: Natural Phënåm¥na
utf8_decode: Natural Phënåm¥na

Second example, which never even utf8_decodes properly (should be an ümlaut and “typographer's quotes” (extended characters added on purpose, as a test:第二个例子,它甚至从来没有正确地 utf8_decodes (应该是一个 ümlaut 和“排版师的引号”(故意添加的扩展字符,作为测试:

no utf8_decode: This pürson from “Vancouver, Canadaâ€
utf8_decode: This pürson from �??Vancouver, Canada�?�

My initial thought was it was doubly encoded, but I don't think that's what's going on.我最初的想法是它被双重编码,但我认为这不是正在发生的事情。 Everything is displaying correctly in MySQL when I do queries on the command line.当我在命令行上进行查询时,一切都在 MySQL 中正确显示。

Here's a rundown of all the things I've investigated:以下是我调查过的所有事情的概述:

  • Content imported is verified to be UTF-8, imported with UTF-8 connection to MySQL导入的内容验证为UTF-8,通过UTF-8连接导入MySQL
  • MySQL Database, tables, columns are UTF-8, utf_unicode_* MySQL数据库,表,列都是UTF-8,utf_unicode_*
  • character_set_client, etc variables in MySQL set to utf8 on Amazon RDS在 Amazon RDS 上,MySQL 中的 character_set_client 等变量设置为 utf8
  • PHP PDO connection is UTF-8, NAME set to UTF-8 PHP PDO 连接为 UTF-8,NAME 设置为 UTF-8
  • Both PHP header charset and HTML meta charset are UTF-8 PHP 标头字符集和 HTML 元字符集都是 UTF-8
  • mb_detect_encoding is returning UTF-8 for both strings mb_detect_encoding 为两个字符串返回 UTF-8

So after a few hours of troubleshooting, I'm kind of at a loss.所以经过几个小时的故障排除后,我有点不知所措。 On a whim I even tried setting the HTML header/meta and PHP headers to ISO-8559-1, but that's not doing the trick either.一时兴起,我什至尝试将 HTML 标头/元和 PHP 标头设置为 ISO-8559-1,但这也不起作用。

I last spent a while battling with Amazon RDS to get the right variables set, but otherwise I'm out of ideas.我最后花了一段时间与 Amazon RDS 进行斗争以设置正确的变量,但除此之外我没有想法。

mysql> show variables like '%character%';
+--------------------------+-------------------------------------------+
| Variable_name            | Value                                     |
+--------------------------+-------------------------------------------+
| character_set_client     | utf8                                      |
| character_set_connection | utf8                                      |
| character_set_database   | utf8                                      |
| character_set_filesystem | utf8                                      |
| character_set_results    | utf8                                      |
| character_set_server     | utf8                                      |
| character_set_system     | utf8                                      |
| character_sets_dir       | /rdsdbbin/mysql-5.5.40.R1/share/charsets/ |
+--------------------------+-------------------------------------------+

So I'm wondering, are there steps I'm missing?所以我想知道,是否有我遗漏的步骤? Something obvious?有什么明显的? Thanks in advance.提前致谢。

UPDATE更新

Here's my PHP output script, for further clarification on the "output" that I mentioned:这是我的 PHP 输出脚本,为了进一步说明我提到的“输出”:

<?php header("Content-type: text/html; charset=utf-8"); ?>
<html>
<header>
    <meta charset="utf-8" />
    <title>My test</title>
</header>
    <body>
<?php


    try {
        $dbh = new PDO("mysql:host=localhost;dbname=database", 
        "user", "password", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
    }
    catch(PDOException $e) {
        echo $e->getMessage();
    }

    $sth = $dbh->prepare("my select statement");
$sth->execute();
$rows = $sth->fetchAll(PDO::FETCH_ASSOC);


foreach ($rows as $row) {
    echo mb_detect_encoding($row['name']);
    echo "<br>no utf8 decode: ". $row['name'] . "<br>\n";
    echo "single utf8 decode: ". utf8_decode($row['name']) . "<br>\n";
    echo "no utf8 decode: ". $row['description'] . "<br>\n";
    echo "single utf8 decode: ". (utf8_decode($row['description'])) . "<br>\n";
}

?>
</body>
</html>

UPDATE #2 I tried also just outputting these same characters into the browser directly from a PHP echo, and straight static HTML, and the characters display perfectly fine.更新 #2我也尝试过直接从 PHP echo 和直接的静态 HTML 将这些相同的字符输出到浏览器中,并且字符显示得非常好。

echo "“test ü ö”<br>"; ?>
<p>“test ü ö”</p>

You should not change all the character_set% fields, just the three that are affected by SET NAMES utf8;您不应更改所有character_set%字段,仅更改受SET NAMES utf8;影响的三个字段SET NAMES utf8; . .

Don't use utf8_encode or decode.不要使用 utf8_encode 或解码。

You have probably messed up when storing.您可能在存储时搞砸了。

This seems to recover the characters, but this not a viable fix:这似乎可以恢复字符,但这不是一个可行的解决方法:

CONVERT(CAST(CONVERT('pürson from “Vancouver, Canadaâ€' USING latin1)
             AS BINARY)
        USING utf8)
--> 'pürson from “Vancouver, Canada - spec',

In order to figure out what was done, please provide为了弄清楚做了什么,请提供

SELECT col, HEX(col) FROM tbl WHERE ...

for some cell that is not rendering properly.对于某些未正确渲染的单元格。

You mentioned that it is all in utf-8 in all the data flow, except when it is rendered on screen.您提到在所有数据流中都是 utf-8 格式,除非它在屏幕上呈现。 I'm assuming on a browser, not a console.我假设在浏览器上,而不是控制台上。 If it is so, check if the html has the <meta charset="utf-8"> inside the <head> tag.如果是这样,请检查 html 的<head>标签内是否有<meta charset="utf-8"> Like in the html5 boilerplate https://github.com/h5bp/html5-boilerplate/blob/master/dist/index.html就像在 html5 样板中一样https://github.com/h5bp/html5-boilerplate/blob/master/dist/index.html

So it looks like somehow on the MySQL level it was double-encoding UTF-8 characters in some of these fields.所以它看起来在 MySQL 级别上以某种方式在其中一些字段中对 UTF-8 字符进行了双重编码。 I was finally able to ascertain it via this great blog post Getting out of MySQL Character Set Hell .我终于能够通过这篇很棒的博客文章走出 MySQL 字符集地狱来确定它。 Not 100% clear if it's being "double-encoded" when it's sent from Python, or when it hits the PHP API, but it's 90% of the answer, right there.不是 100% 清楚当它从 Python 发送时是否被“双重编码”,或者当它击中 PHP API 时,但它是 90% 的答案,就在那里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM