[英]PostgreSQL sort order affected by succeeding characters
I want to sort test last name
and test2 last name
so that the former comes before the latter.我想对
test last name
和test2 last name
进行排序,以便前者出现在后者之前。 My understanding is that each character is compared from left to right until they differ and therefore the characters after those first differing characters do not matter anymore.我的理解是,每个字符都是从左到右进行比较,直到它们不同,因此第一个不同字符之后的字符不再重要。 However, as shown below,
test
comes before test2
but as soon as I append another character, the order changes.但是,如下所示,
test
在test2
之前,但只要我 append 另一个字符,顺序就会改变。 Why does this happen?为什么会这样? What collation should I use to get the desired order?
我应该使用什么排序规则来获得所需的订单? Note that converting them to
bytea
would yield the desired order.请注意,将它们转换为
bytea
会产生所需的顺序。
test=# SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test last' < 'test2 last' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test l' < 'test2 l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test l' < 'test2l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test last name'::bytea < 'test2 last name'::bytea;
?column?
----------
t
(1 row)
white space is special character in ICU collation.空格是 ICU 整理中的特殊字符。
see demo: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples看演示: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples
also here: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings也在这里: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings
simple explanation: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed简单解释: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed
You can following test:您可以进行以下测试:
CREATE COLLATION coll_shifted(provider = icu, locale = 'en-u-ka-shifted');
CREATE COLLATION coll_noignore(provider = icu, locale = 'en-u-ka-noignore');
SELECT 'test last name' < 'test2 last name' COLLATE coll_shifted
union all
SELECT 'test last name' < 'test2 last name' COLLATE coll_noignore
union all
SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
if you just want compare by code pointer, you can use COLLATE "C" or COLLATE "POSIX".如果您只想通过代码指针进行比较,则可以使用 COLLATE "C" 或 COLLATE "POSIX"。
That's how natural language collations work.这就是自然语言排序的工作原理。 If you want to compare character by character and have the space character be like other characters, use the C collation:
如果要逐个字符进行比较并使空格字符与其他字符一样,请使用 C 排序规则:
SELECT 'test last name' < 'test2 last name' COLLATE "C";
?column?
══════════
t
(1 row)
But don't complain if 'Z' < 'a'
…但是如果
'Z' < 'a'
不要抱怨......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.