简体   繁体   English

PostgreSQL ORDER BY 问题 - 自然排序

[英]PostgreSQL ORDER BY issue - natural sort

I've got a Postgres ORDER BY issue with the following table:我有下表的 Postgres ORDER BY问题:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,要向表中插入新记录,

  1. I select the last record with SELECT * FROM employees ORDER BY em_code DESC我用SELECT * FROM employees ORDER BY em_code DESC选择最后一条记录
  2. Strip alphabets from em_code usiging reg exp and store in ec_alpha使用 reg exp 从 em_code 中去除字母并存储在ec_alpha
  3. Cast the remating part to integer ec_num将 remating 部分转换为整数ec_num
  4. Increment by one ec_num++增加 1 ec_num++
  5. Pad with sufficient zeors and prefix ec_alpha again再次填充足够的 zeors 和前缀ec_alpha

When em_code reaches EM1000, the above algorithm fails.em_code达到 EM1000 时,上述算法失败。

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code , breaking the unique key constraint.第一步将返回 EM999 而不是 EM1000,它将再次生成 EM1000 作为新的em_code ,打破唯一键约束。

Any idea how to select EM1000?知道如何选择 EM1000 吗?

One approach you can take is to create a naturalsort function for this.您可以采取的一种方法是为此创建一个naturalsort函数。 Here's an example, written by Postgres legend RhodiumToad .这是一个例子,由 Postgres 传奇RhodiumToad 编写

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql来源: http : //www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:要使用它,只需通过以下方式按您的顺序调用该函数:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9 .原因是字符串按字母顺序排序(而不是像您想要的那样按数字排序)并且19之前排序。 You could solve it like this:你可以这样解决:

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.如果可以的话,从em_code删除多余的“EM”并保存一个整数会更有效。

Answer to question in comment在评论中回答问题

To strip any and all non-digits from a string:从字符串中去除任何和所有非数字:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\\D is the regular expression class-shorthand for "non-digits". \\D是“非数字”的正则表达式类简写
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first. 'g'作为第四个参数是“全局”开关,用于将替换应用于字符串中的每个出现,而不仅仅是第一个。

After replacing every non-digit with the empty string, only digits remain.用空字符串替换每个非数字后,只剩下数字。

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this.这总是出现在问题和我自己的发展中,我终于厌倦了这样做的棘手方法。 I finally broke down and implemented it as a PostgreSQL extension:我终于崩溃了,将它实现为 PostgreSQL 扩展:

https://github.com/Bjond/pg_natural_sort_order https://github.com/Bjond/pg_natural_sort_order

It's free to use, MIT license.它是免费使用的,MIT 许可证。

Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel.基本上它只是标准化字符串中的数字(零前置数字),这样您就可以创建一个索引列以进行全速排序。 The readme explains.自述文件解释了。

The advantage is you can have a trigger do the work and not your application code.优点是您可以让触发器来完成工作,而不是您的应用程序代码。 It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.它将在 PostgreSQL 服务器上以机器速度计算,并且添加列的迁移变得简单快捷。

你可以只使用这一行“ORDER BY length(substring(em_code FROM '[0-9]+')), em_code”

I wrote about this in detail in this related question:我在这个相关问题中详细描述了这一点:

Humanized or natural number sorting of mixed word-and-number strings 混合字数串的人性化或自然数排序

(I'm posting this answer as a useful cross-reference only, so it's community wiki). (我将此答案仅作为有用的交叉参考发布,因此它是社区维基)。

Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.从 Postgres 9.6 开始,可以指定一个排序规则来自然地对带有数字的列进行排序。

https://www.postgresql.org/docs/10/collation.html https://www.postgresql.org/docs/10/collat​​ion.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

Now just query as you would otherwise.现在只需像其他方式一样查询即可。

SELECT * FROM employees ORDER BY em_code

On my data, I get results in this order (note that it also sorts foreign numerals):在我的数据上,我按以下顺序得到结果(请注意,它还对外国数字进行了排序):

Value价值
0 0
0001 0001
001 001
1 1
06 06
6 6
13 13
۱۳沱沱
14 14

I came up with something slightly different.我想出了一些稍微不同的东西。

The basic idea is to create an array of tuples (integer, string) and then order by these.基本思想是创建一个元组数组(integer, string) ,然后按这些排序。 The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.幻数 2147483647 是 int32_max,用于将字符串排在数字之后。

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )

I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.我想到了另一种方法,它比填充使用更少的数据库存储,并且比动态计算节省时间。

https://stackoverflow.com/a/47522040/935122 https://stackoverflow.com/a/47522040/935122

I've also put it on GitHub我也把它放在GitHub上

https://github.com/ccsalway/dbNaturalSort https://github.com/ccsalway/dbNaturalSort

The following solution is a combination of various ideas presented in another question , as well as some ideas from the classic solution :以下解决方案结合了另一个问题中提出的各种想法,以及经典解决方案中的一些想法:

create function natsort(s text) returns text immutable language sql as $$
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;

The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.这个函数的设计目标是简单和纯字符串操作(没有自定义类型和数组),所以它可以很容易地用作一个嵌入式解决方案,并且很容易被索引。

Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length.注意:如果您希望数字超过20位,则必须将函数中的硬编码最大长度20替换为合适的更大长度。 Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.请注意,这将直接影响结果字符串的长度,因此不要使该值大于所需值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM