简体   繁体   中英

Postgresql : Converting comma separated integer values and intervals to sequenced numbers

I have a table containing the following:

Value is a varchar type to store string characters like comma and dashes so anything goes. But typically it's contains just numbers , comma and dash to specify intervals.

id | value      | 
------------------
1  | 1,2,5,8-10 |
2  | 1,2,3      |
3  | 1-3        |
4  | 1-3, 4-5   |
5  | 1-2,2-3    |

I want to perform a select query, to retrieve the values in a "normalized" code-readable format (comma-separated) in database level (not in code level), that's why I need to select a table to be something like this.

id | value      | normalized
-------------------------------
1  | 1,2,5,8-10 |1,2,5,8,9,10
2  | 1,2,3      |1,2,3
3  | 1-3        |1,2,3
4  | 1-3, 4-5   |1,2,3,4,5
5  | 1-2,2-3    |1,2,3

Special case for record with id # 5, even if it specifies 2 twice, it should still only retrieve 2 once. Is there a function in postgres that does this already? if not how do I parse the strings and order the numbers in Postgres sql ?

This seems like a good case for a procedure in your preferred PL, or a simple C extension. pl/perl, pl/pythonu or pl/v8 would be my choices.

That said, it's easy enough in SQL. Split to find the subranges, which can be a single digit or a range. Then for each range generate_series over it.

eg:

SELECT n 
FROM
   regexp_split_to_table('1,2,5,8-10', ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n;

which you could wrap up as a SQL function, or use as part of a query over a table.

Applied to a table, you get something like:

CREATE TABLE example
    ("id" int, "value" varchar)
;

INSERT INTO example
    ("id", "value")
VALUES
    (1, '1,2,5,8-10'),
    (2, '1,2,3'),
    (3, '1-3'),
    (4, '1-3, 4-5'),
    (5, '1-2,2-3')
;

When applied to a table that's something along the lines of:

SELECT
  example.id,
  array_agg(DISTINCT n) AS expanded_set
FROM
   example,
   regexp_split_to_table(example.value, ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n
 GROUP BY
   example.id;

Result (with original col added):

 id | original_format |  expanded_set  
----+-----------------+----------------
  1 | 1,2,5,8-10      | {1,2,5,8,9,10}
  2 | 1,2,3           | {1,2,3}
  3 | 1-3             | {1,2,3}
  4 | 1-3, 4-5        | {1,2,3,4,5}
  5 | 1-2,2-3         | {1,2,3}
(5 rows)

This won't be particularly fast, but it might be OK. If not, write something faster in C as an extension, or maybe plperl or something.

To understand what's going on, read the PostgreSQL manual sections on:

  • GROUP BY and aggregation
  • Aggregate functions, particularly array_agg
  • DISTINCT as an aggregation qualifier
  • PostgreSQL arrays, which I use here as an intermediate state and a result
  • The generate_series function
  • The regexp_split_to_table and regexp_split_to_array functions
  • LATERAL queries, which are used implicitly here because one function consumes results from another function in the join list.

The above example will only work in PostgreSQL 9.2 and newer. If you have an older version you have to work around the lack of LATERAL using layers of nested subqueries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM