简体   繁体   English

处理从Java到PHP到MySQL的字符编码

[英]Handling character encoding from Java to PHP to MySQL

In Java I pass a String to PHP. 在Java中,我将String传递给PHP。

In PHP I take that String and do a search for it with a MySQL query. 在PHP中,我使用该String并使用MySQL查询进行搜索。

Here is php code: 这是PHP代码:

    $query = $database->escape_value(trim($_POST['query'])); 
    $result = mysqli_query($dbconnection, Data::getSearchQuery($query));
    while ($row = mysqli_fetch_assoc($result)) {
        $output[] = $row;
    }
    print(json_encode($output));

    mysqli_close($dbconnection);


  public static function getSearchQuery($item_query) {

        $query = "
            SELECT i.item, i.item_id, c.category, c.cat_id
            FROM items as i
            LEFT JOIN master_cat AS c
                    ON (c.cat_id = i.cat_id)
            WHERE i.item LIKE '%{$item_query}%' 

            ORDER BY i.item ASC;";

        return $query;
    }

This always works if I use regular characters on my US keyboard. 如果我在美国键盘上使用常规字符,这将始终有效。 But the moment I start using irregular characters, the search turns empty. 但是从我开始使用不规则字符的那一刻起,搜索就变成了空白。

I can verify that MySQL stores the data AS THE USER ENTERS IT. 我可以验证MySQL是否在用户输入数据时存储了数据。 So if they typed Beyoncè , that is how database stores it. 因此,如果他们键入Beyoncè ,则数据库Beyoncè这种方式存储它。

But when I search for Beyoncè (or whatever) in the above code, it returns empty. 但是,当我在上面的代码中搜索Beyoncè (或其他)时,它返回空。

How should I handle the char. 我应该如何处理字符。 encoding here? 在这里编码?

Three points to think of: 需要考虑三点:

1) The $item_query variable could come in wrong encoding. 1) $item_query变量的编码可能错误。

2) >> I can verify that MySQL stores the data AS THE USER ENTERS IT 2)>> 我可以验证MySQL是否在用户输入时存储了数据

This can get tricky. 这可能会很棘手。 If one writes an iso8859-1 encoded string to an utf-8 database, the string is obviously stored incorrectly. 如果将iso8859-1编码的字符串写入utf-8数据库,则该字符串显然存储不正确。 If that string is read with a client (ie phpmyadmin or mysql command line tool) configured to iso8859-1, the string is correctly returned - although its representation in the database is clearly wrong. 如果使用配置为iso8859-1的客户端(即phpmyadmin或mysql命令行工具)读取该字符串,则将正确返回该字符串-尽管它在数据库中的表示形式显然是错误的。

3) The MySql settings: Have your set utf-8 for the connection itself? 3)MySql设置:您是否为连接本身设置了utf-8? What about charsets and collations for the database/the table? 数据库/表的字符集和排序规则如何?

https://dev.mysql.com/doc/refman/5.5/en/charset-syntax.html https://dev.mysql.com/doc/refman/5.5/en/charset-syntax.html

UPDATE: I assume you want everything to be UTF-8. 更新:我假设您希望一切都为UTF-8。 Kind of quick hack to test: 可以快速测试的种类:

  • Beyoncé has 7 characters (see MySQL CHAR_LENGTH function) BeyoncéBeyoncé有7个字符(请参阅MySQL CHAR_LENGTH函数)
  • in UTF-8, it occupies 8 bytes (see MySQL LENGTH function). 在UTF-8中,它占用8个字节(请参见MySQL LENGTH函数)。 The eight bytes are, represented in a one-byte-per-character encoding like windows-1252, something like Beyoncé . 八个字节以每字符一个字节的编码表示,例如Windows-1252,类似于Beyoncé

This leads to the following diagnostic tests ... 这导致以下诊断测试...

  1. The PHP-issued SQL command PHP发行的SQL命令

     "SELECT CHAR_LENGTH($item_query), LENGTH($item_query);" 

    should then return a result of (7, 8) to show us that the $item_query variable is probably correctly encoded and the database likes UTF-8 . 然后应返回(7,8)的结果,以表明$ item_query变量可能已正确编码,并且数据库喜欢UTF-8 (7, 7) would mean $item_query wasn't UTF-8, and (8, 8) would mean the database doesn't want to deal with UTF-8 yet. (7,7)表示$ item_query不是UTF-8,而(8,8)表示数据库还不想处理UTF-8。 If the latter is the case, then perhaps issue a SET NAMES 'UTF8'; 如果是后者,则发出SET NAMES 'UTF8'; before the query. 查询之前。

  2. Similarly, the PHP-issued SQL command 同样,PHP发出的SQL命令

     SELECT CHAR_LENGTH('Beyoncé'), LENGTH('Beyoncé'); 

    should return the result (7, 8) to show us that your PHP editor is configured to edit UTF-8 php files . 应该返回结果(7,8),向​​我们显示您的PHP编辑器已配置为编辑UTF-8 php文件

  3. Repeat the previous step with phpmyadmin (or any SQL client) to be sure that this client uses UTF-8 , too. 对phpmyadmin(或任何SQL客户端)重复上一步,以确保该客户端也使用UTF-8

  4. No table was involved yet! 尚未涉及任何表格! The SQL command SQL命令

     SELECT CHAR_LENGTH(somecolumn), LENGTH(somecolumn) FROM sometable; 

    (with sometable having UTF-8 character encoding and somecolumn containing some diacritical characters) should tell you if UTF-8 was used when storing values to the table . (使用具有UTF-8字符编码的sometable和包含一些变音字符的somecolumn)应该告诉您在将值存储到表时是否使用了UTF-8

  5. If all previous tests passed, test again with LIKE. 如果以前的所有测试均通过,请再次使用LIKE进行测试。 Even 'Beyoncé' LIKE 'Beyonce' should work then. 甚至'Beyoncé' LIKE 'Beyonce'这样的'Beyoncé' LIKE 'Beyonce'应该可以工作。 For more information, google MySQL collation. 有关更多信息,请使用Google MySQL排序规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM