简体   繁体   English

在Ruby 1.9.3上使用Rails 2.3编码错误

[英]Encoding error with Rails 2.3 on Ruby 1.9.3

I'm in the process of upgrading an old legacy Rails 2.3 app to something more modern and running into an encoding issue. 我正在将旧的传统Rails 2.3应用程序升级到更现代的东西并遇到编码问题。 I've read all the existing answers I can find on this issue but I'm still running into problems. 我已经阅读了我在这个问题上可以找到的所有现有答案,但我仍然遇到问题。

Rails ver: 2.3.17 Ruby ver: 1.9.3p385 Rails ver:2.3.17 Ruby ver:1.9.3p385

My MySQL tables are default charset: utf8 , collation: utf8_general_ci . 我的MySQL表是默认的字符集: utf8 ,collat​​ion: utf8_general_ci Prior to 1.9 I was using the original mysql gem without incident. 在1.9之前我使用了原始的mysql gem而没有发生任何事故。 After upgrading to 1.9 when it retrieved anything with utf8 characters in it would get this well-documented problem: 升级到1.9后,当它检索到任何包含utf8字符的内容时,会得到这个记录良好的问题:

ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8)

I switched to the mysql2 gem for it's superior handling and I no longer see exceptions but things are definitely not encoding correctly. 我切换到mysql2 gem,因为它具有出色的处理能力,我不再看到异常,但事情肯定不能正确编码。 For example, what appears in the DB as the string Repoussé is being rendered by Rails as Repoussé , “Boat” appears as “Boat†, etc. 例如,这似乎在DB为字符串Repoussé是由Rails的作为呈现Repoussé“Boat”出现“Boatâ€等。

A few more details: 更多细节:

  • I see the same results when I use the ruby-mysql gem as the driver. 当我使用ruby-mysql gem作为驱动程序时,我看到相同的结果。
  • I've added encoding: utf8 lines to each entry in my database.yml 我已经为我的database.yml每个条目添加了encoding: utf8

I've also added the following to my environment.rb : 我还在我的environment.rb添加了以下内容:

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

It has occurred to me that I may have some mismatch where latin1 was being written by the old version of the app into the utf8 fields of the database or something, but all of the characters appear correctly when viewed in the mysql command line client. 在我看来,我可能有一些不匹配,其中latin1被旧版本的应用程序写入数据库的utf8字段或其他东西,但在mysql命令行客户端查看时所有字符都正确显示。

Thanks in advance for any advice, much appreciated! 在此先感谢您的任何建议,非常感谢!

UPDATE: I now believe that the issue is that my utf8 data is being coerced through a binary conversion into latin1 on the way out of the db, I'm just not sure where. 更新:我现在认为问题是我的utf8数据正在通过二进制转换强制转换为latin1在数据库的出路,我只是不确定在哪里。

mysql> SELECT CONVERT(CONVERT(name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(name USING BINARY) USING utf8) AS utf8 FROM items WHERE id=myid;
+-------------+----------+
| latin1      | utf8     |
+-------------+----------+
| Repoussé   | Repoussé |
+-------------+----------+

I have my encoding set to utf8 in database.yml, any other ideas where this could be coming from? 我将我的encoding设置为database.yml中的utf8 ,以及其他可能来自哪些想法?

I finally figured out what my issue was. 我终于弄清楚我的问题是什么。 While my databases were encoded with utf8 , the app with the original mysql gem was injecting latin1 text into the utf8 tables. 虽然我的数据库是用utf8编码的,但是原始mysql gem的应用程序将latin1文本注入到utf8表中。

What threw me off was that the output from the mysql comand line client looked correct. 让我失望的是mysql命令行客户端的输出看起来是正确的。 It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8 . 确认您的终端,数据库字段 MySQL客户端都在utf8运行非常重要。

MySQL's client runs in latin1 by default. MySQL的客户端默认在latin1运行。 You can discover what it is running in by issuing this query: 您可以通过发出以下查询来发现其运行状态:

show variables like 'char%';

If setup properly for utf8 you should see: 如果为utf8正确设置,您应该会看到:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file: 如果这些看起来不正确,请确保在my.cnf配置文件的[client]部分中设置了以下内容:

default-character-set = utf8

Add add the following to the [mysqld] section: 将以下内容添加到[mysqld]部分:

# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci

Make sure to restart the mysql daemon before relaunching the client and then verify. 确保在重新启动客户端之前重新启动mysql守护程序,然后进行验证。

NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8 . 注意:这不会更改现有数据库的字符集或排序规则,只是确保创建的任何新数据库都默认为utf8 ,并且客户端将以utf8显示。

After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. 完成此操作后,我在mysql客户端中看到与我从mysql2 gem得到的字符匹配的字符。 I was also able to verify that this content was latin1 by switching to " encoding: latin1 " temporarily in my database.conf . 我还能够通过在我的database.conf临时切换到“ encoding: latin1 ”来验证此内容是否为latin1

One extremely handy query to find issues is using char length to find the rows with multi-byte characters: 查找问题的一个非常方便的查询是使用char length来查找具有多字节字符的行:

SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);

There are a lot of scripts out there to convert latin1 contents to utf8 , but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8 : 有很多脚本可以将latin1内容转换为utf8 ,但最适合我的是将所有数据库转储为latin1并将内容重新填充为utf8

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset  DBNAME > DBNAME.sql

mysql -u root -p --default-character-set=utf8  DBNAME < DBNAME.sql

I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB. 我先备份了我的主数据库,然后转储到测试数据库中,并疯狂地进行了验证,然后再转移到更正的数据库上。

My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me. 我的理解是,MySQL的翻译可以为某些更复杂的字符留下一些东西,但由于我的大多数字节都是相当常见的东西(重音符号,引号等),这对我来说非常有用。

Some resources that proved invaluable in sorting all of this out: 在整理所有这些方面,一些资源被证明是无价的:

You say it all looks OK in the command line client, but perhaps your Terminal's character encoding isn't set to show UTF8? 你说它在命令行客户端看起来都很好,但也许你的终端的字符编码没有设置为显示UTF8? To check in OS X Terminal, click Terminal > Preferences > Settings > Advanced > Character Encoding. 要在OS X终端中签入,请单击终端>首选项>设置>高级>字符编码。 Also, check using a graphical tool like MySQL Query Browser at http://dev.mysql.com/downloads/gui-tools/5.0.html . 另外,请使用http://dev.mysql.com/downloads/gui-tools/5.0.html上的 MySQL Query Browser等图形工具进行检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM