简体   繁体   中英

Encoding error with Rails 2.3 on Ruby 1.9.3

I'm in the process of upgrading an old legacy Rails 2.3 app to something more modern and running into an encoding issue. I've read all the existing answers I can find on this issue but I'm still running into problems.

Rails ver: 2.3.17 Ruby ver: 1.9.3p385

My MySQL tables are default charset: utf8 , collation: utf8_general_ci . Prior to 1.9 I was using the original mysql gem without incident. After upgrading to 1.9 when it retrieved anything with utf8 characters in it would get this well-documented problem:

ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8)

I switched to the mysql2 gem for it's superior handling and I no longer see exceptions but things are definitely not encoding correctly. For example, what appears in the DB as the string Repoussé is being rendered by Rails as Repoussé , “Boat” appears as “Boat†, etc.

A few more details:

  • I see the same results when I use the ruby-mysql gem as the driver.
  • I've added encoding: utf8 lines to each entry in my database.yml

I've also added the following to my environment.rb :

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

It has occurred to me that I may have some mismatch where latin1 was being written by the old version of the app into the utf8 fields of the database or something, but all of the characters appear correctly when viewed in the mysql command line client.

Thanks in advance for any advice, much appreciated!

UPDATE: I now believe that the issue is that my utf8 data is being coerced through a binary conversion into latin1 on the way out of the db, I'm just not sure where.

mysql> SELECT CONVERT(CONVERT(name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(name USING BINARY) USING utf8) AS utf8 FROM items WHERE id=myid;
+-------------+----------+
| latin1      | utf8     |
+-------------+----------+
| Repoussé   | Repoussé |
+-------------+----------+

I have my encoding set to utf8 in database.yml, any other ideas where this could be coming from?

I finally figured out what my issue was. While my databases were encoded with utf8 , the app with the original mysql gem was injecting latin1 text into the utf8 tables.

What threw me off was that the output from the mysql comand line client looked correct. It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8 .

MySQL's client runs in latin1 by default. You can discover what it is running in by issuing this query:

show variables like 'char%';

If setup properly for utf8 you should see:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file:

default-character-set = utf8

Add add the following to the [mysqld] section:

# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci

Make sure to restart the mysql daemon before relaunching the client and then verify.

NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8 .

After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. I was also able to verify that this content was latin1 by switching to " encoding: latin1 " temporarily in my database.conf .

One extremely handy query to find issues is using char length to find the rows with multi-byte characters:

SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);

There are a lot of scripts out there to convert latin1 contents to utf8 , but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8 :

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset  DBNAME > DBNAME.sql

mysql -u root -p --default-character-set=utf8  DBNAME < DBNAME.sql

I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB.

My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me.

Some resources that proved invaluable in sorting all of this out:

You say it all looks OK in the command line client, but perhaps your Terminal's character encoding isn't set to show UTF8? To check in OS X Terminal, click Terminal > Preferences > Settings > Advanced > Character Encoding. Also, check using a graphical tool like MySQL Query Browser at http://dev.mysql.com/downloads/gui-tools/5.0.html .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM