简体   繁体   English

Postgresql:将逗号分隔的整数值和间隔转换为序列数

[英]Postgresql : Converting comma separated integer values and intervals to sequenced numbers

I have a table containing the following:我有一个包含以下内容的表:

Value is a varchar type to store string characters like comma and dashes so anything goes. Value 是一种varchar类型,用于存储commadashes等字符串字符,因此任何事情都可以进行。 But typically it's contains just numbers , comma and dash to specify intervals.但通常它只包含numberscommadash来指定间隔。

id | value      | 
------------------
1  | 1,2,5,8-10 |
2  | 1,2,3      |
3  | 1-3        |
4  | 1-3, 4-5   |
5  | 1-2,2-3    |

I want to perform a select query, to retrieve the values in a "normalized" code-readable format (comma-separated) in database level (not in code level), that's why I need to select a table to be something like this.我想执行一个select查询,以在数据库级别(不是在代码级别)以“规范化”代码可读格式(逗号分隔)检索值,这就是为什么我需要选择一个表是这样的。

id | value      | normalized
-------------------------------
1  | 1,2,5,8-10 |1,2,5,8,9,10
2  | 1,2,3      |1,2,3
3  | 1-3        |1,2,3
4  | 1-3, 4-5   |1,2,3,4,5
5  | 1-2,2-3    |1,2,3

Special case for record with id # 5, even if it specifies 2 twice, it should still only retrieve 2 once. id #5 的记录的特殊情况,即使它指定了 2 两次,它仍然应该只检索 2 一次。 Is there a function in postgres that does this already? postgres中是否有一个函数可以做到这一点? if not how do I parse the strings and order the numbers in Postgres sql ?如果不是,我该如何解析字符串并在Postgres sql对数字进行排序?

This seems like a good case for a procedure in your preferred PL, or a simple C extension.对于您首选的 PL 中的过程或简单的 C 扩展,这似乎是一个很好的例子。 pl/perl, pl/pythonu or pl/v8 would be my choices. pl/perl、pl/pythonu 或 pl/v8 将是我的选择。

That said, it's easy enough in SQL.也就是说,它在 SQL 中很容易。 Split to find the subranges, which can be a single digit or a range.拆分以查找子范围,可以是单个数字或范围。 Then for each range generate_series over it.然后为每个范围 generate_series 在它上面。

eg:例如:

SELECT n 
FROM
   regexp_split_to_table('1,2,5,8-10', ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n;

which you could wrap up as a SQL function, or use as part of a query over a table.您可以将其包装为SQL函数,或用作对表进行查询的一部分。

Applied to a table, you get something like:应用于表,你会得到类似的东西:

CREATE TABLE example
    ("id" int, "value" varchar)
;

INSERT INTO example
    ("id", "value")
VALUES
    (1, '1,2,5,8-10'),
    (2, '1,2,3'),
    (3, '1-3'),
    (4, '1-3, 4-5'),
    (5, '1-2,2-3')
;

When applied to a table that's something along the lines of:当应用于表格时,它类似于:

SELECT
  example.id,
  array_agg(DISTINCT n) AS expanded_set
FROM
   example,
   regexp_split_to_table(example.value, ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n
 GROUP BY
   example.id;

Result (with original col added):结果(添加了原始 col):

 id | original_format |  expanded_set  
----+-----------------+----------------
  1 | 1,2,5,8-10      | {1,2,5,8,9,10}
  2 | 1,2,3           | {1,2,3}
  3 | 1-3             | {1,2,3}
  4 | 1-3, 4-5        | {1,2,3,4,5}
  5 | 1-2,2-3         | {1,2,3}
(5 rows)

This won't be particularly fast, but it might be OK.这不会特别快,但可能没问题。 If not, write something faster in C as an extension, or maybe plperl or something.如果没有,请在 C 中编写一些更快的扩展作为扩展,或者 plperl 或其他东西。

To understand what's going on, read the PostgreSQL manual sections on:要了解发生了什么,请阅读 PostgreSQL 手册部分:

  • GROUP BY and aggregation GROUP BY和聚合
  • Aggregate functions, particularly array_agg聚合函数,尤其是array_agg
  • DISTINCT as an aggregation qualifier DISTINCT作为聚合限定符
  • PostgreSQL arrays, which I use here as an intermediate state and a result PostgreSQL 数组,我在这里用作中间状态和结果
  • The generate_series function generate_series函数
  • The regexp_split_to_table and regexp_split_to_array functions regexp_split_to_tableregexp_split_to_array函数
  • LATERAL queries, which are used implicitly here because one function consumes results from another function in the join list. LATERAL查询,这里隐式使用,因为一个函数使用连接列表中另一个函数的结果。

The above example will only work in PostgreSQL 9.2 and newer.上面的示例仅适用于 PostgreSQL 9.2 及更高版本。 If you have an older version you have to work around the lack of LATERAL using layers of nested subqueries.如果您有旧版本,则必须使用嵌套子查询层来解决缺少LATERAL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM