I cannot understand the behavior of PostgreSQL (v11.10). Here is what I do:
create temp table test (first_name text, last_name text);
insert into test values
('Hanna', 'Beat'),
('JOAN', 'BEET'),
('Mark', 'Bernstein'),
('ALFRED', 'DOE'),
('henry', 'doe'),
('Henry', 'Doe'),
('Dennis', 'Doe');
select last_name, first_name from test order by last_name, first_name;
This is what I get.
last_name | first_name
-----------+------------
Beat | Hanna
BEET | JOAN
Bernstein | Mark
doe | henry
Doe | Dennis
Doe | Henry
DOE | ALFRED
(7 rows)
It looks like the sorting of the first three names is case-insensitive, but for the last four it's case-sensitive. Why is that so?
In other words, if the sorting were case-sensitive, I would expect the following order:
last_name | first_name
-----------+------------
Beat | Hanna
Bernstein | Mark
BEET | JOAN
doe | henry
Doe | Dennis
Doe | Henry
DOE | ALFRED
(7 rows)
and if it were case-insensitive, I would expect this:
last_name | first_name
-----------+------------
Beat | Hanna
BEET | JOAN
Bernstein | Mark
DOE | ALFRED
Doe | Dennis
doe | henry
Doe | Henry
(7 rows)
What I get instead is a mixture of both, and that baffles me...
For completeness:
# show lc_collate; show lc_ctype;
lc_collate
-------------
en_US.UTF-8
(1 row)
lc_ctype
-------------
en_US.UTF-8
(1 row)
Natural language collations are more complicated than you think. They use different comparison levels , where higher levels are used as tie-breakers when strings compare equal on a lower level. Typically, accents and case are ignored at the primary level. At the secondary level, accents are respected, but case is ignored. On the tertiary level, case and accents are respected.
So the strings Etat
, état
and etat
would compare identical on the primary level. On the secondary level, état
would be greater than the other two, which would be equal. On the tertiary level, etat
would be less than Etat
. All in all, we end up with
'etat' < 'Etat' < 'état'
It is kind of arbitrary that upper case characters are greater than lower case characters, and with ICU collations you can configure most of these aspects.
In your example, BEET
is less than Bernstein
on the primary level, so that is the order in which the strings are sorted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.