I have a table containing the following:
Value is a varchar
type to store string characters like comma
and dashes
so anything goes. But typically it's contains just numbers
, comma
and dash
to specify intervals.
id | value |
------------------
1 | 1,2,5,8-10 |
2 | 1,2,3 |
3 | 1-3 |
4 | 1-3, 4-5 |
5 | 1-2,2-3 |
I want to perform a select
query, to retrieve the values in a "normalized" code-readable format (comma-separated) in database level (not in code level), that's why I need to select a table to be something like this.
id | value | normalized
-------------------------------
1 | 1,2,5,8-10 |1,2,5,8,9,10
2 | 1,2,3 |1,2,3
3 | 1-3 |1,2,3
4 | 1-3, 4-5 |1,2,3,4,5
5 | 1-2,2-3 |1,2,3
Special case for record with id # 5, even if it specifies 2 twice, it should still only retrieve 2 once. Is there a function in postgres
that does this already? if not how do I parse the strings and order the numbers in Postgres sql
?
This seems like a good case for a procedure in your preferred PL, or a simple C extension. pl/perl, pl/pythonu or pl/v8 would be my choices.
That said, it's easy enough in SQL. Split to find the subranges, which can be a single digit or a range. Then for each range generate_series over it.
eg:
SELECT n
FROM
regexp_split_to_table('1,2,5,8-10', ',') subrange,
regexp_split_to_array(subrange, '-') subrange_parts,
generate_series(subrange_parts[1]::integer,
coalesce(subrange_parts[2], subrange_parts[1])::integer
) n;
which you could wrap up as a SQL
function, or use as part of a query over a table.
Applied to a table, you get something like:
CREATE TABLE example
("id" int, "value" varchar)
;
INSERT INTO example
("id", "value")
VALUES
(1, '1,2,5,8-10'),
(2, '1,2,3'),
(3, '1-3'),
(4, '1-3, 4-5'),
(5, '1-2,2-3')
;
When applied to a table that's something along the lines of:
SELECT
example.id,
array_agg(DISTINCT n) AS expanded_set
FROM
example,
regexp_split_to_table(example.value, ',') subrange,
regexp_split_to_array(subrange, '-') subrange_parts,
generate_series(subrange_parts[1]::integer,
coalesce(subrange_parts[2], subrange_parts[1])::integer
) n
GROUP BY
example.id;
Result (with original col added):
id | original_format | expanded_set
----+-----------------+----------------
1 | 1,2,5,8-10 | {1,2,5,8,9,10}
2 | 1,2,3 | {1,2,3}
3 | 1-3 | {1,2,3}
4 | 1-3, 4-5 | {1,2,3,4,5}
5 | 1-2,2-3 | {1,2,3}
(5 rows)
This won't be particularly fast, but it might be OK. If not, write something faster in C as an extension, or maybe plperl or something.
To understand what's going on, read the PostgreSQL manual sections on:
GROUP BY
and aggregation array_agg
DISTINCT
as an aggregation qualifier generate_series
function regexp_split_to_table
and regexp_split_to_array
functions LATERAL
queries, which are used implicitly here because one function consumes results from another function in the join list. The above example will only work in PostgreSQL 9.2 and newer. If you have an older version you have to work around the lack of LATERAL
using layers of nested subqueries.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.