简体   繁体   English

如何对 SQL 查询进行排序,但将某些 UTF-8 字符排序为正常等效字符? (例如É被视为E等)

[英]How can I sort an SQL query but have certain UTF-8 characters be ordered as their normal equivalent? (e.g. É be regarded as E etc)

I have a table of character names in a mySQL database.我在 mySQL 数据库中有一个字符名称表。

I am trying to query the table and sort them alphabetically by name.我正在尝试查询表格并按名称按字母顺序对它们进行排序。

Some of the characters have names like "The Dagda" and the "The " needs to be ignored so I am attempting to use:一些字符的名称如“The Dagda”和“The”需要被忽略,所以我尝试使用:

select character_id, name from characters where is_del=0 order by trim('The ' from name)

Which seems to work...这似乎工作......

Some of the other characters have UTF-8 characters in their names such as "Ériu"其他一些字符的名称中包含 UTF-8 字符,例如“Ériu”

However now when my table is returned I get these "É" entries listed between "A" & "B".但是,现在当我的表被返回时,我会在“A”和“B”之间列出这些“É”条目。

IE: IE:

Aengus Amergin Ériu Balor Banba etc. Aengus Amergin Ériu Balor Banba 等

Preservation of these UTF characters is crucially important on the front end.保留这些 UTF 字符在前端至关重要。

Does anyone know a method where I could have these "É" characters and similar be represented as "E" for purposes of sorting, but will still render in the dataset as what they actually are?有谁知道一种方法,我可以将这些“É”字符和类似字符表示为“E”以进行排序,但仍会在数据集中呈现它们的实际情况?

I am thinking before asking this that this may not be possible but I am hoping someone here might have run into a similar problem before and might have a workaround.在问这个问题之前我在想这可能是不可能的,但我希望这里的某个人之前可能遇到过类似的问题并且可能有解决方法。

Thanks in advance.提前致谢。

EDIT: changed UTF-16 to UTF-8 (my bad)编辑:将 UTF-16 更改为 UTF-8(我的错)

EDIT @Rick James :编辑@Rick James:

I could not format this readably in a comment but the hex of the query is as follows:我无法在评论中以可读的方式格式化它,但查询的十六进制如下:

name |姓名 | hex(name)十六进制(名称)

Aengus Óg |安格斯·格 | 41656E67757320C383E2809C67 41656E67757320C383E2809C67
Amergin |阿美金 | 416D657267696E 416D657267696E
Ériu |らriu | C383E280B0726975 C383E280B0726975
Balor |巴洛尔 | 42616C6F72 42616C6F72
Banba |板坝 | 42616E6261 42616E6261

The 3rd item down is Ériu - I am not sure why they are rendering as above but this is what is being displayed through the phpmyadmin interface when I run the query select character_id, name, hex(name) from characters order by trim('The ' from name)向下的第 3 项是 Ériu - 我不确定它们为什么会像上面那样呈现,但这是当我运行查询select character_id, name, hex(name) from characters order by trim('The ' from name)

The first character's full name should be Aengus Óg (I am assuming this is again down to character set or collation but I am unsure so apologies for the ignorance on my part here)第一个角色的全名应该是 Aengus Óg (我假设这又归结为字符集或排序规则,但我不确定是否为我在这里的无知而道歉)

"Double encoding" seems to be the problem. “双重编码”似乎是问题所在。 I discuss this somewhat in Trouble with UTF-8 characters;我在UTF-8 字符的麻烦中对此进行了一些讨论; what I see is not what I stored 我看到的不是我存储的

Should `应该`

41 65 6E 67 75 73 20 C383 E2809C 67

Óg is hex C393 67 in UTF-8. Óg是 UTF-8 中的十六进制C393 67

Latin1 hex C3 93 67 is Óg Latin1 hex C3 93 67Óg

Repeat to get C383 E2809C 67重复得到C383 E2809C 67

CONVERT(BINARY(CONVERT('Aengus Óg' USING latin1))
               USING utf8mb4) --> 'Aengus Óg'

This seems to be "double encoding":这似乎是“双重编码”:

CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C383E280B0726975') USING utf8mb4) USING latin1)) USING utf8mb4) --> 'Ériu' CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C383E280B0726975') USING utf8mb4) USING latin1)) USING utf8mb4) --> 'Ériu'

With Ériu as an intermediate step.Ériu作为中间步骤。 This explains why it sorted with the A's.这解释了为什么它与 A 排序。

This is a common problem.这是一个常见的问题。 It often goes unnoticed because browsers "fix" the mess.它经常被忽视,因为浏览器“修复”了混乱。

Experiment with SELECTs against the table.对表进行 SELECT 试验。 If the first one works for you, then it is just Mojibake.如果第一个适合您,那么它就是 Mojibake。

SELECT CONVERT(BINARY(CONVERT(my_column USING latin1))
               USING utf8mb4)
    FROM ... WHERE ...;

Read that other Q&A to see what steps went wrong to cause the problem.阅读其他问答,了解哪些步骤出错导致问题。 It likely involves storing UTF-8 characters in a column declared latin1 .它可能涉及将 UTF-8 字符存储在声明为latin1的列中。

ALTER TABLE ... CONVERT TO ... assumes that the data is correctly stored. ALTER TABLE ... CONVERT TO ...假定数据已正确存储。 But it wasn't.但事实并非如此。 Now you have the CHARACTER SET correctly set on the columns, but the data in it has been Mojibaked.现在您已在列上正确设置了CHARACTER SET ,但其中的数据已被 Mojibaked。 So, it needs something like所以,它需要类似的东西

UPDATE tbl  SET
    col1 = CONVERT(BINARY(CONVERT(col1 USING latin1))
           USING utf8mb4),
    col2 = CONVERT(BINARY(CONVERT(col2 USING latin1))
           USING utf8mb4),
    ...
    ;

More on the fix:http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases有关修复的更多信息:http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Rollback?回滚? If you are more comfortable rolling back to before the CONVERT TO, then ignore most of what I said before, then you need the 2-step ALTER after the rollback.如果您更愿意回滚到 CONVERT TO 之前,那么忽略我之前所说的大部分内容,那么您需要在回滚之后进行 2 步 ALTER。 (See that blog link.) (请参阅该博客链接。)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Javascript中,如何解码字符串中包含二进制(例如非UTF-8)数据的字符串? - In Javascript, how do I decode a string in which decoded string contains binary (e.g. non UTF-8) data? 如何使用正则表达式获取特定字符后的所有字符,例如逗号(“,”) - How can I use regex to get all the characters after a specific character, e.g. comma (“,”) 如何使用 kotlin JS 测试 UI? (例如在页面上有一个元素) - How can I test UI with kotlin JS? (e.g. have an element on a page) 当用户输入特殊字符时如何显示菜单或下拉列表,例如 % - How can I have display a menu or dropdown list when user enters a special character e.g. % p:inputMask-如何自定义掩码字符,例如强制输入以大写字母表示 - p:inputMask - how to customize the mask characters e.g. force input to capitalize certain letter 如何确定 HERE 中的 map 类型(例如正常、卫星)? - How do I determine the map type (e.g. normal, satellite) in HERE? 在Javascript生成的HTML中显示UTF-8(例如井号) - Displaying UTF-8 in Javascript Generated HTML (e.g. pound sign) 使用JavaScript使用UTF-8(例如希腊语)编码将JSON导出为CSV或Excel - Export JSON to CSV or Excel with UTF-8 (e.g. Greek) encoding using JavaScript 如何通过某个值获取 Map 密钥? 例如 Map.prototype.get -> key by the minimum value - How Can I get the Map key by a certain value? E.g. Map.prototype.get -> key by the lowest value 如何在Javascript中打印罗马语(例如西班牙语)/特殊字符? - How do I print roman languages (e.g. Spanish) /special characters in Javascript?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM