简体   繁体   中英

Handling character encoding from Java to PHP to MySQL

In Java I pass a String to PHP.

In PHP I take that String and do a search for it with a MySQL query.

Here is php code:

    $query = $database->escape_value(trim($_POST['query'])); 
    $result = mysqli_query($dbconnection, Data::getSearchQuery($query));
    while ($row = mysqli_fetch_assoc($result)) {
        $output[] = $row;
    }
    print(json_encode($output));

    mysqli_close($dbconnection);


  public static function getSearchQuery($item_query) {

        $query = "
            SELECT i.item, i.item_id, c.category, c.cat_id
            FROM items as i
            LEFT JOIN master_cat AS c
                    ON (c.cat_id = i.cat_id)
            WHERE i.item LIKE '%{$item_query}%' 

            ORDER BY i.item ASC;";

        return $query;
    }

This always works if I use regular characters on my US keyboard. But the moment I start using irregular characters, the search turns empty.

I can verify that MySQL stores the data AS THE USER ENTERS IT. So if they typed Beyoncè , that is how database stores it.

But when I search for Beyoncè (or whatever) in the above code, it returns empty.

How should I handle the char. encoding here?

Three points to think of:

1) The $item_query variable could come in wrong encoding.

2) >> I can verify that MySQL stores the data AS THE USER ENTERS IT

This can get tricky. If one writes an iso8859-1 encoded string to an utf-8 database, the string is obviously stored incorrectly. If that string is read with a client (ie phpmyadmin or mysql command line tool) configured to iso8859-1, the string is correctly returned - although its representation in the database is clearly wrong.

3) The MySql settings: Have your set utf-8 for the connection itself? What about charsets and collations for the database/the table?

https://dev.mysql.com/doc/refman/5.5/en/charset-syntax.html

UPDATE: I assume you want everything to be UTF-8. Kind of quick hack to test:

  • Beyoncé has 7 characters (see MySQL CHAR_LENGTH function)
  • in UTF-8, it occupies 8 bytes (see MySQL LENGTH function). The eight bytes are, represented in a one-byte-per-character encoding like windows-1252, something like Beyoncé .

This leads to the following diagnostic tests ...

  1. The PHP-issued SQL command

     "SELECT CHAR_LENGTH($item_query), LENGTH($item_query);" 

    should then return a result of (7, 8) to show us that the $item_query variable is probably correctly encoded and the database likes UTF-8 . (7, 7) would mean $item_query wasn't UTF-8, and (8, 8) would mean the database doesn't want to deal with UTF-8 yet. If the latter is the case, then perhaps issue a SET NAMES 'UTF8'; before the query.

  2. Similarly, the PHP-issued SQL command

     SELECT CHAR_LENGTH('Beyoncé'), LENGTH('Beyoncé'); 

    should return the result (7, 8) to show us that your PHP editor is configured to edit UTF-8 php files .

  3. Repeat the previous step with phpmyadmin (or any SQL client) to be sure that this client uses UTF-8 , too.

  4. No table was involved yet! The SQL command

     SELECT CHAR_LENGTH(somecolumn), LENGTH(somecolumn) FROM sometable; 

    (with sometable having UTF-8 character encoding and somecolumn containing some diacritical characters) should tell you if UTF-8 was used when storing values to the table .

  5. If all previous tests passed, test again with LIKE. Even 'Beyoncé' LIKE 'Beyonce' should work then. For more information, google MySQL collation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM