简体   繁体   中英

UTF8 characters from database don't show up properly in the browser - MySQL & PHP CodeIgniter

My database and tables are set to utf8_general_ci collation and utf8 charset. CodeIgniter is set to utf8 . I've added meta tag charset=utf8 , and I'm still getting something like: квартиры instead of cyrillic letters...

The same code running on the local machine works fine - Mac OSX. It's only breaking in the production machine, which is Ubuntu 11.10 64bit in AWS EC2. Static content from the .php files show up correctly, only the data coming from the database are messed up. Example page: http://dev.uzlist.com/browse/cat/nkv

Any ideas why?

Thanks.

FYI : When I do error_log() the data coming from the database, it's the same values I'm seeing on the page. Hence, it's not the browser-server issue. It's something between mysql and php, since when I run SELECT * FROM categories , it shows the data in the right format. I'm using PHP CodeIgniter framework for database connection and query and as mentioned here , I have configured it to use utf8 connection and utf8_general_ci collation.

Make sure your my.cnf (likely to be in /etc/) has the following entries :

[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'

[client]
default-character-set=utf8

You'll need to restart the mysql service once you make your changes.

Adding my comments in here to make this a little clearer.

Make sure the following HTTP header is being set so the browser knows what charset to expect.

Content-type: text/html; charset=UTF-8

Also try adding this tag into the top of your html <head> tag

<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />

To make the browser show up correctly.you should check three points:

  1. encoding of your script file.
  2. encoding of connection.
  3. encoding of database or table schema.

if all of these are compatible, you'll get the page you want.

The original data has been encoded as UTF-8, the result interpreted in Windows-1252 and then UTF-8 encoded again. This is really bad; it isn't about a simple encoding mismatch that a header would fix. Your data is actually broken.

If the data is ok in the database (check with SELECT hex(column) FROM myTable) to see if it was double encoded already in the database), then there must be your code that is converting it to UTF-8 on output.

Search your project for uses of function utf8_encode , convert_to_utf8 , or just iconv or mb_convert_encoding . Running

$ grep -rn "\(utf8_\(en\|de\)code\|convert_to_utf8\|iconv\|mb_convert_encoding\)" .

On your application's /application folder should be enough to find something.

Also see config values for these:

<?php
var_dump(
    ini_get( "mbstring.http_output" ),
    ini_get( "mbstring.encoding_translation" )
);

Well, if you absolutely and positively sure that your mysql client encoding is set to utf8 , there are 2 possible cases. One - double encoding - described by Esailija.

But there is another one: you have your data actually encoded in 1251, not in utf-8. In this case you have to either recode your data or set proper encoding on the tables. Though it is not one button push task
Here is a manual (in russian) exаctly for that case: http://phpfaq.ru/charset#repair

In short, you have to dump your table, using the same encoding set on the table (to avoid recoding), backup that dump in safe place, then change table definitions to reflect the actual encoding and then load it back.

Potentially this may also be caused by the mbstring extension not being installed (which would explain a difference between your dev and production environments)

Check out this post , might give you a few more answers.

Try mysql_set_charset('utf8') after the mysql connect. Then it should works.

After 2 days of fighting this bug, finally figured out the issue. Thanks for @yourcommonsense, @robsquires, and a friend of mine from work for good resources that helped to debug the issue.

The issue was that at the time of the sql file dump to the database (import), charset for server, database, client, and connection was set to latin1 ( status command helped to figure that out). So the command line was set to latin1 as well, which is why it was showing the right characters, but the connection with the PHP code was UTF8 and it was trying to encode it again. Ended up with double encoding.

Solution :

  1. mysqldump the tables and the data (while in latin1 )
  2. dump the database
  3. set the default charsets to UTF8 in /etc/my.cnf as Rob Squires mentioned
  4. restart the mysql
  5. create the database again with the right charset and collation
  6. dump the file back into it

And it works fine.

Thanks all for contribution!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM