PostgreSQL 受后续字符影响的排序顺序

Question

I want to sort test last name and test2 last name so that the former comes before the latter.我想对test last name和test2 last name进行排序，以便前者出现在后者之前。 My understanding is that each character is compared from left to right until they differ and therefore the characters after those first differing characters do not matter anymore.我的理解是，每个字符都是从左到右进行比较，直到它们不同，因此第一个不同字符之后的字符不再重要。 However, as shown below, test comes before test2 but as soon as I append another character, the order changes.但是，如下所示， test在test2之前，但只要我 append 另一个字符，顺序就会改变。 Why does this happen?为什么会这样？ What collation should I use to get the desired order?我应该使用什么排序规则来获得所需的订单？ Note that converting them to bytea would yield the desired order.请注意，将它们转换为bytea会产生所需的顺序。

test=# SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test last' < 'test2 last' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test ' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2  ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test l' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test l' < 'test2l' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test ' < 'test2l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test last name'::bytea < 'test2 last name'::bytea;
 ?column? 
----------
 t
(1 row)

Answer 1

white space is special character in ICU collation.空格是 ICU 整理中的特殊字符。

see demo: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples看演示： https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples
also here: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings也在这里： http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings
simple explanation: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed简单解释： https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed

You can following test:您可以进行以下测试：

CREATE COLLATION coll_shifted(provider = icu, locale = 'en-u-ka-shifted');
CREATE COLLATION coll_noignore(provider = icu, locale = 'en-u-ka-noignore');

SELECT 'test last name' < 'test2 last name' COLLATE coll_shifted
union all
SELECT 'test last name' < 'test2 last name' COLLATE coll_noignore
union all
SELECT 'test last name' < 'test2 last name' COLLATE "en_US";

if you just want compare by code pointer, you can use COLLATE "C" or COLLATE "POSIX".如果您只想通过代码指针进行比较，则可以使用 COLLATE "C" 或 COLLATE "POSIX"。

Answer 2

That's how natural language collations work.这就是自然语言排序的工作原理。 If you want to compare character by character and have the space character be like other characters, use the C collation:如果要逐个字符进行比较并使空格字符与其他字符一样，请使用 C 排序规则：

SELECT 'test last name' < 'test2 last name' COLLATE "C";

 ?column? 
══════════
 t
(1 row)

But don't complain if 'Z' < 'a' …但是如果'Z' < 'a'不要抱怨......

PostgreSQL 受后续字符影响的排序顺序

问题描述

2 个解决方案

解决方案1
1 已采纳 2023-01-30 07:05:36

解决方案2
0 2023-01-30 05:24:45

PostgreSQL 受后续字符影响的排序顺序

问题描述

2 个解决方案

解决方案1 1 已采纳 2023-01-30 07:05:36

解决方案2 0 2023-01-30 05:24:45

解决方案1
1 已采纳 2023-01-30 07:05:36

解决方案2
0 2023-01-30 05:24:45