简体   繁体   English

SQL ORDER BY 一个字符串值:它在比较什么? (区分大小写?)

[英]SQL ORDER BY a string value: What is it comparing? (Case sensitive?)

I would like to know what exactly SQL is comparing when we use the ORDER BY statement.当我们使用 ORDER BY 语句时,我想知道 SQL 到底在比较什么。 More specifically, I'm interested in the comparison when it compares string.更具体地说,我对比较字符串时的比较感兴趣。 Supposedly it is sorting it alphabetically, but what is it actually comparing?据说它是按字母顺序排序的,但它实际上比较的是什么?

My hunch tells me it could be comparing ASCII values of the characters starting from the left, which would also imply that the sorting is case sensitive ('Btest' would be smaller than 'atest'), but I am unable to find a source confirming this.我的直觉告诉我它可能是比较从左边开始的字符的 ASCII 值,这也意味着排序区分大小写('Btest' 将小于 'atest'),但我无法找到来源确认这个。

In MySQL, it depends on the effective collation.在 MySQL 中,它取决于有效的排序规则。 Collation is the set of rules that determine the position of characters in an ordered set and what characters are considered equal, and typically involve natural language rules.排序规则是一组规则,用于确定有序集中的 position 个字符以及哪些字符被视为相等,通常涉及自然语言规则。 For example, Spanish used to have ch as an independent letter located betwen c and d and then switched to being just individual c and h ;例如,西班牙语曾经将ch作为独立字母,位于cd之间,然后切换为单独ch MySQL has collations for both. MySQL 对两者都有排序规则。

You can see available collations with these commands :您可以使用这些命令查看可用的排序规则:

SHOW COLLATION; -- Display all
SHOW COLLATION WHERE charset = 'utf8mb4'; -- Filter by encoding
Collation校对 Charset字符集 Id ID Default默认 Compiled已编译 Sortlen排序 Pad_attribute填充属性
utf8mb4_0900_ai_ci utf8mb4_0900_ai_ci utf8mb4 utf8mb4 255 255 Yes是的 Yes是的 0 0 NO PAD没有垫
utf8mb4_0900_as_ci utf8mb4_0900_as_ci utf8mb4 utf8mb4 305 305 Yes是的 0 0 NO PAD没有垫
utf8mb4_0900_as_cs utf8mb4_0900_as_cs utf8mb4 utf8mb4 278 278 Yes是的 0 0 NO PAD没有垫
utf8mb4_0900_bin utf8mb4_0900_bin utf8mb4 utf8mb4 309 309 Yes是的 1 1个 NO PAD没有垫
utf8mb4_bin utf8mb4_bin utf8mb4 utf8mb4 46 46 Yes是的 1 1个 PAD SPACE填充空间
utf8mb4_croatian_ci utf8mb4_croatian_ci utf8mb4 utf8mb4 245 245 Yes是的 8 8个 PAD SPACE填充空间
utf8mb4_cs_0900_ai_ci utf8mb4_cs_0900_ai_ci utf8mb4 utf8mb4 266 266 Yes是的 0 0 NO PAD没有垫
utf8mb4_cs_0900_as_cs utf8mb4_cs_0900_as_cs utf8mb4 utf8mb4 289 289 Yes是的 0 0 NO PAD没有垫
utf8mb4_czech_ci utf8mb4_czech_ci utf8mb4 utf8mb4 234 234 Yes是的 8 8个 PAD SPACE填充空间
utf8mb4_danish_ci utf8mb4_danish_ci utf8mb4 utf8mb4 235 235 Yes是的 8 8个 PAD SPACE填充空间

[...] [...]

Collation names in MySQL use some common substrings to indicate certain features: MySQL 中的排序规则名称使用一些公共子字符串来指示某些特征:

  • ci / cs for Case Insensitive / Case Sensitive ci / cs不区分大小写/区分大小写
  • ai / as for Accent Sensitive / Accent Insensitive ai / as口音敏感/口音不敏感

... and some others ( full list here ). ...和其他一些( 完整列表在这里)。

In MySQL, you can set collation at several levels:在 MySQL 中,可以设置几个级别的排序规则:

  • Server服务器
  • Database数据库
  • Table桌子
  • Column柱子
  • Connection联系
  • Individual strings in SQL SQL 中的单个字符串

So you always get one, either explicit or implicit.所以你总是得到一个,无论是显式的还是隐式的。

This type of question is simple to test.这种类型的问题很容易测试。
Here I test is dbFiddle: you can run the same script, with other values if required in your database to check with your local settings, collation etc.我在这里测试的是 dbFiddle:您可以运行相同的脚本,如果数据库中需要的话,可以使用其他值来检查您的本地设置、排序规则等。

 create table sortable( val varchar(10)); insert into sortable values ('A'),('B'),('a'),('b');
 ✓ ✓ ✓ ✓
 SELECT val FROM sortable ORDER BY val;
 | | val |值 | |:-- | |:-- | | | A |一个| | | a |一个 | | | B |乙 | | | b |乙 |

db<>fiddle here db<> 在这里摆弄

If you want to switch to ASCII Code sorting you can change the order to sort by bynary value of the characters with the following:如果你想切换到 ASCII 代码排序,你可以更改顺序以按字符的二进制值排序,方法如下:

SELECT id, random_varchar FROM table ORDER BY NLSSORT(random_varchar, 'NLS_SORT = BINARY') SELECT id, random_varchar FROM table ORDER BY NLSSORT(random_varchar, 'NLS_SORT = BINARY')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM