Postgresql : Converting comma separated integer values and intervals to sequenced numbers

Question

I have a table containing the following:

Value is a varchar type to store string characters like comma and dashes so anything goes. But typically it's contains just numbers , comma and dash to specify intervals.

id | value      | 
------------------
1  | 1,2,5,8-10 |
2  | 1,2,3      |
3  | 1-3        |
4  | 1-3, 4-5   |
5  | 1-2,2-3    |

I want to perform a select query, to retrieve the values in a "normalized" code-readable format (comma-separated) in database level (not in code level), that's why I need to select a table to be something like this.

id | value      | normalized
-------------------------------
1  | 1,2,5,8-10 |1,2,5,8,9,10
2  | 1,2,3      |1,2,3
3  | 1-3        |1,2,3
4  | 1-3, 4-5   |1,2,3,4,5
5  | 1-2,2-3    |1,2,3

Special case for record with id # 5, even if it specifies 2 twice, it should still only retrieve 2 once. Is there a function in postgres that does this already? if not how do I parse the strings and order the numbers in Postgres sql ?

Answer 1

This seems like a good case for a procedure in your preferred PL, or a simple C extension. pl/perl, pl/pythonu or pl/v8 would be my choices.

That said, it's easy enough in SQL. Split to find the subranges, which can be a single digit or a range. Then for each range generate_series over it.

eg:

SELECT n 
FROM
   regexp_split_to_table('1,2,5,8-10', ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n;

which you could wrap up as a SQL function, or use as part of a query over a table.

Applied to a table, you get something like:

CREATE TABLE example
    ("id" int, "value" varchar)
;

INSERT INTO example
    ("id", "value")
VALUES
    (1, '1,2,5,8-10'),
    (2, '1,2,3'),
    (3, '1-3'),
    (4, '1-3, 4-5'),
    (5, '1-2,2-3')
;

When applied to a table that's something along the lines of:

SELECT
  example.id,
  array_agg(DISTINCT n) AS expanded_set
FROM
   example,
   regexp_split_to_table(example.value, ',') subrange,
   regexp_split_to_array(subrange, '-') subrange_parts,
   generate_series(subrange_parts[1]::integer, 
                  coalesce(subrange_parts[2], subrange_parts[1])::integer
   ) n
 GROUP BY
   example.id;

Result (with original col added):

 id | original_format |  expanded_set  
----+-----------------+----------------
  1 | 1,2,5,8-10      | {1,2,5,8,9,10}
  2 | 1,2,3           | {1,2,3}
  3 | 1-3             | {1,2,3}
  4 | 1-3, 4-5        | {1,2,3,4,5}
  5 | 1-2,2-3         | {1,2,3}
(5 rows)

This won't be particularly fast, but it might be OK. If not, write something faster in C as an extension, or maybe plperl or something.

To understand what's going on, read the PostgreSQL manual sections on:

GROUP BY and aggregation
Aggregate functions, particularly array_agg
DISTINCT as an aggregation qualifier
PostgreSQL arrays, which I use here as an intermediate state and a result
The generate_series function
The regexp_split_to_table and regexp_split_to_array functions
LATERAL queries, which are used implicitly here because one function consumes results from another function in the join list.

The above example will only work in PostgreSQL 9.2 and newer. If you have an older version you have to work around the lack of LATERAL using layers of nested subqueries.

Postgresql : Converting comma separated integer values and intervals to sequenced numbers

Question

1 answers

solution1
2 ACCPTED 2015-07-29 06:24:49

Postgresql : Converting comma separated integer values and intervals to sequenced numbers

Question

1 answers

solution1 2 ACCPTED 2015-07-29 06:24:49

solution1
2 ACCPTED 2015-07-29 06:24:49