I am trying to compare two arrays of the following rows(group by) in the same column and return the array and total number in different columns. I want to have similar results mentioned on this link Compare two arrays and count number of the same strings . But over here arrays are compared between columns but I would like to compare between rows.
So, my initial dataset might look like this
|Id. | column 1 |
|----------|---------------------|
|1 | [cat, dog, bird] |
|1 |. [cat, bird] |
|1 | [cat, bear, tiger]|
|1 |. [cat, tiger] |
|2 | [cat, tiger] |
|2 | [cat, bear, tiger]|
|2 | [cat, bird] |
|3 |. [tiger] |
|3 |. [cat, bird] |
So, my final dataset might look something like this.
|Id. | column 1 | column 2 | column 3 |
|----------|---------------------|------------------|------------------|
|1 | [cat, dog, bird] | [dog] | 1 |
|1 |. [cat, bird] | [bird] | 1 |
|1 | [cat, bear, tiger] | [bear] | 1 |
|1 | [cat, tiger] |. [cat, tiger ] | |
|2 | [cat, tiger] | | 0 |
|2 | [cat, bear, tiger] | [bear, tiger]. | 2 |
|2 | [cat, bird] | [cat, bird] | |
|3 | [tiger] | [tiger] | 1 |
|3 | [cat, bird] | [cat, bird | |
column 2 contains information about the first array but does not belong to the second array and column 3 contains information about how many elements are inside column 2.
Thank You
So the LEAD example:
with data(id, _order, col1) as (
select column1, column2, parse_json(column3)
from values
(1, 1, '["cat", "dog", "bird"]'),
(1, 2, '["cat", "bird"]'),
(1, 3, '["cat", "bear", "tiger"]'),
(1, 4, '["cat", "tiger"]'),
(2, 5, '["cat", "tiger"]'),
(2, 6, '["cat", "bear", "tiger"]'),
(2, 7, '["cat", "bird"]'),
(3, 8, '["tiger"]'),
(3, 9, '["cat", "bird"]')
)
select *
,lead(col1) over (partition by id order by _order ) as lead_col1
from data
order by _order;
ID | _ORDER | COL1 | LEAD_COL1 |
---|---|---|---|
1 | 1 | [ "cat", "dog", "bird" ] | [ "cat", "bird" ] |
1 | 2 | [ "cat", "bird" ] | [ "cat", "bear", "tiger" ] |
1 | 3 | [ "cat", "bear", "tiger" ] | [ "cat", "tiger" ] |
1 | 4 | [ "cat", "tiger" ] | null |
2 | 5 | [ "cat", "tiger" ] | [ "cat", "bear", "tiger" ] |
2 | 6 | [ "cat", "bear", "tiger" ] | [ "cat", "bird" ] |
2 | 7 | [ "cat", "bird" ] | |
3 | 8 | [ "tiger" ] | [ "cat", "bird" ] |
3 | 9 | [ "cat", "bird" ] | null |
Anyways: here's the code how I would do it:
with data(id, col1) as (
select column1, parse_json(column2)
from values
(1, '["cat", "dog", "bird"]'),
(1, '["cat", "bird"]'),
(1, '["cat", "bear", "tiger"]'),
(1, '["cat", "tiger"]'),
(2, '["cat", "tiger"]'),
(2, '["cat", "bear", "tiger"]'),
(2, '["cat", "bird"]'),
(3, '["tiger"]'),
(3, '["cat", "bird"]')
), flatten as (
select
d.id,
dense_rank() over(order by f.seq) as seq,
f.index,
f.value as val
from data as d,
table(flatten(input=>d.col1)) f
)
select
a.id
,a.seq
,array_agg(a.val) within group (order by a.index) as col1
,array_agg(nvl2(a.val, nvl2(b.val, null, a.val), b.val)) within group (order by a.index) as col2
,array_size(col2 )as col3
from flatten as a
full outer join flatten as b
on a.id = b.id and a.seq + 1 = b.seq and a.val = b.val
where a.id is not null
group by a.id, a.seq
order by a.seq;
ID | SEQ | COL1 | COL2 | COL3 |
---|---|---|---|---|
1 | 1 | [ "cat", "dog", "bird" ] | [ "dog" ] | 1 |
1 | 2 | [ "cat", "bird" ] | [ "bird" ] | 1 |
1 | 3 | [ "cat", "bear", "tiger" ] | [ "bear" ] | 1 |
1 | 4 | [ "cat", "tiger" ] | [ "cat", "tiger" ] | 2 |
2 | 5 | [ "cat", "tiger" ] | [] | 0 |
2 | 6 | [ "cat", "bear", "tiger" ] | [ "bear", "tiger" ] | 2 |
2 | 7 | [ "cat", "bird" ] | [ "cat", "bird" ] | 2 |
3 | 8 | [ "tiger" ] | [ "tiger" ] | 1 |
3 | 9 | [ "cat", "bird" ] | [ "cat", "bird" ] | 2 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.