简体   繁体   English

Rails / MySQL奇怪的UTF-8编码问题

[英]Rails / MySQL weird UTF-8 encoding issue

I have an app that uses Cyrillic (Macedonian) alphabet. 我有一个使用西里尔文(马其顿)字母的应用程序。 I have an alphabet menu with all of the letters (manually typed as an array which I'm calling it from a helper), that links to items where the first character is that letter. 我有一个包含所有字母的字母表菜单(手动键入一个数组,我从助手中调用它),该菜单链接到第一个字符是该字母的项目。 So, it seems that "К" and "Ќ" (also "Г" and "Ѓ") are listing the same items, as if they were using the same letter. 因此,似乎“К”和“Ќ”(也称为“Г”和“Ѓ”)列出了相同的项目,就好像它们使用的是同一字母一样。 Works great in development, not sure why it does this on production. 在开发中效果很好,不确定为什么要在生产中使用。 I have set UTF-8 encoding on production database. 我在生产数据库上设置了UTF-8编码。

Here's the prod log. 这是产品日志。 It's not getting the same character. 它没有获得相同的角色。

App 18197 stderr: Started GET "/letterfilter?title=%D0%8C" for IP at2015-07-30 12:03:46 -0400
App 18197 stderr: Processing by PostsController#letterfilter as HTML
App 18197 stderr:   Parameters: {"title"=>"Ќ"}
App 18197 stderr:   Rendered posts/letterfilter.html.haml within layouts/application (4.3ms)
App 18197 stderr:   Rendered posts/_search.html.haml (0.8ms)
App 18197 stderr:   Rendered shared/_header.html.haml (9.6ms)
App 18197 stderr:   Rendered shared/_footer.html.haml (0.2ms)
App 18197 stderr: Completed 200 OK in 18ms (Views: 16.6ms | ActiveRecord: 0.2ms)

App 18197 stderr: Started GET "/letterfilter?title=%D0%9A" for IP at 2015-07-30 12:03:51 -0400
App 18197 stderr: Processing by PostsController#letterfilter as HTML
App 18197 stderr:   Parameters: {"title"=>"К"}
App 18197 stderr:   Rendered posts/letterfilter.html.haml within layouts/application (4.9ms)
App 18197 stderr:   Rendered posts/_search.html.haml (0.7ms)
App 18197 stderr:   Rendered shared/_header.html.haml (7.7ms)
App 18197 stderr:   Rendered shared/_footer.html.haml (0.2ms)
App 18197 stderr: Completed 200 OK in 17ms (Views: 14.2ms | ActiveRecord: 0.9ms)

What might be causing this issue? 是什么导致此问题? Should I update my database encoding to utf8mb4? 我应该将数据库编码更新为utf8mb4吗?

Any help is welcomed. 欢迎任何帮助。 Thanks. 谢谢。

Ќ is hex D08C in utf8 or utf8mb4. Ќ是十六进制D08C在UTF8或utf8mb4。 Cyrillic is completely covered by either CHARACTER SET . 西里尔字母完全被任一CHARACTER SET覆盖。 К is D09A , as can be seen in the ?title= КD09A ,如?title=

Hmmm, this is quite interesting: 嗯,这很有趣:

mysql> SELECT 'К' = 'Ќ' COLLATE utf8_bin AS bin,
              'К' = 'Ќ' COLLATE utf8_general_ci AS general,
              'К' = 'Ќ' COLLATE utf8_unicode_ci AS unicode;
+-----+---------+---------+
| bin | general | unicode |
+-----+---------+---------+
|   0 |       1 |       0 |
+-----+---------+---------+

mysql> SELECT 'Г' = 'Ѓ' COLLATE utf8_bin AS bin,
              'Г' = 'Ѓ' COLLATE utf8_general_ci AS general,
              'Г' = 'Ѓ' COLLATE utf8_unicode_ci AS unicode;
+-----+---------+---------+
| bin | general | unicode |
+-----+---------+---------+
|   0 |       1 |       0 |
+-----+---------+---------+

The bit patterns are different, so utf8_bin collates unequal. 位模式不同,因此utf8_bin整理不相等。 But usually whenever general is equal, so is unicode. 但是通常只要通用是相等的,unicode也是如此。

Back to your question... What do you mean "it's not getting the same character"? 回到您的问题...您的意思是“没有得到相同的角色”?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM