简体   繁体   English

MySQL的UTF-8字错误顺序

[英]MySQL wrong order of UTF-8 words

When I add UTF-8 words to a table column, and execute an ordered SELECT, the sort order is wrong. 当我在表列中添加UTF-8单词并执行有序的SELECT时,排序顺序是错误的。 On DESC sort, the order is correct but on ASC sort, the order is wrong. 在DESC排序上,顺序正确,但在ASC排序上,顺序错误。 How to fix that? 如何解决? Let me explain on example. 让我解释一个例子。 Lets have a mysql table with Slovak collate: 让我们用斯洛伐克语整理一个mysql表:

CREATE TABLE IF NOT EXISTS test (
   aaa varchar(255) CHARACTER SET utf8 COLLATE utf8_slovak_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovak_ci;

Now lets insert some values with UTF-8 words: 现在让我们插入一些带有UTF-8单词的值:

INSERT INTO test (aaa) VALUES
('Leco'),
('Lečo'),
('Ledo'),
('Chovatelstvo'),
('Chovateľstvo')

Here is Slovak alphabet explained, you can see which letters are after which other letters: http://en.wikipedia.org/wiki/Slovak_orthography 这是斯洛伐克语字母的说明,您可以看到哪些字母后面是其他字母: http : //en.wikipedia.org/wiki/斯洛伐克语orthography

Now when I select with order, I expect to get the following result: 现在,当我按顺序选择时,我希望得到以下结果:

SELECT aaa FROM test ORDER BY aaa ASC
Chovatelstvo
Chovateľstvo
Leco
Lečo
Ledo

And I also expect the exactly opposite order for DESC. 我也期望DESC的顺序完全相反。 But here is what I get in fact: 但是,这实际上是我得到的:

SELECT aaa FROM test ORDER BY aaa ASC
Chovateľstvo
Chovatelstvo
Leco
Lečo
Ledo

and DESC: 和DESC:

SELECT aaa FROM test ORDER BY aaa DESC
Ledo
Lečo
Leco
Chovateľstvo
Chovatelstvo

You can see there 你可以在那里看到

Chovateľstvo
Chovatelstvo

is always in the given order regardless of ASC or DESC. 无论ASC还是DESC,总是以给定的顺序。 I noticed that if I insert the rows in opposite order, it may end up as 我注意到,如果我以相反的顺序插入行,它可能会以

Chovatelstvo
Chovateľstvo

meaning that the actual order is opposite, but again is the same for ASC and DESC. 表示实际顺序相反,但是对于ASC和DESC来说,顺序相同。 As like if mysql considered those two letters 'l' and 'ľ' as equal. 就像mysql认为这两个字母“ l”和“ľ”相等。

I tried this with some older version of MySQL, as well as newest version of MariaDB on another server, the result is the same. 我在某些旧版本的MySQL以及另一台服务器上的最新版本的MariaDB中进行了尝试,结果是相同的。

Any idea what causes that and how to fix it? 知道是什么原因造成的,以及如何解决?

In both the utf8_slovak_ci and utf8_general_ci collations, the letter ľ and the letter l are considered the same. utf8_slovak_ciutf8_general_ci归类中,字母ľ和字母l被认为是相同的。

You can see this by observing that this query returns true (1) 您可以通过观察该查询返回true(1)来查看此信息。

select _utf8 'Chovateľstvo' collate utf8_slovak_ci = _utf8 'Chovatelstvo'

The designers of that collation obviously believe that ľ and l belong together in the dictionary. 该排序规则的设计者显然认为ľl在字典中属于一起。 The only collations I can find that do not do that are latin2_hungarian_ci and cp1250_czech_cs . 我可以找到的唯一排序规则不是latin2_hungarian_cicp1250_czech_cs But to use either one of those you'll have to change your character set choice. 但是要使用其中任何一种,都必须更改字符集选择。

If you must have them be different, you could try the utf8_bin collation. 如果必须使它们不同,则可以尝试utf8_bin排序规则。 But that will be entirely case sensitive. 但这将完全区分大小写。

The way ORDER BY works is basically correct for the rules in the collation. ORDER BY工作方式对于排序规则中的规则基本上是正确的。

Maybe there's a defect in the collation? 排序规则中可能有缺陷吗? You could submit a defect report to the MySql team at https://bugs.mysql.com/ 您可以在https://bugs.mysql.com/向MySql团队提交缺陷报告。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM