简体   繁体   中英

how to find difference between following rows in an array in SQL snowflake?

I am trying to compare two arrays of the following rows(group by) in the same column and return the array and total number in different columns. I want to have similar results mentioned on this link Compare two arrays and count number of the same strings . But over here arrays are compared between columns but I would like to compare between rows.

So, my initial dataset might look like this

|Id.       |      column 1       | 
|----------|---------------------|
|1         |   [cat, dog, bird]  |
|1         |.  [cat, bird]       |
|1         |   [cat, bear, tiger]| 
|1         |.  [cat, tiger]      |
|2         |   [cat, tiger]      |
|2         |   [cat, bear, tiger]|  
|2         |   [cat, bird]       |
|3         |.  [tiger]           |
|3         |.  [cat, bird]       |

So, my final dataset might look something like this.

|Id.       |      column 1       |     column 2     |     column 3     |
|----------|---------------------|------------------|------------------|
|1         |  [cat, dog, bird]   |  [dog]           |          1       |
|1         |. [cat, bird]        |  [bird]          |          1       |
|1         |  [cat, bear, tiger] |  [bear]          |          1       |
|1         |  [cat, tiger]       |. [cat, tiger ]   |                  |
|2         |  [cat, tiger]       |                  |          0       |
|2         |  [cat, bear, tiger] |  [bear, tiger].  |          2       |    
|2         |  [cat, bird]        |  [cat, bird]     |                  |
|3         |  [tiger]            |  [tiger]         |          1       |
|3         |  [cat, bird]        |  [cat, bird      |                  |

column 2 contains information about the first array but does not belong to the second array and column 3 contains information about how many elements are inside column 2.

Thank You

So the LEAD example:

with data(id, _order, col1) as (
    select column1, column2, parse_json(column3)
    from values
        (1, 1, '["cat", "dog", "bird"]'),
        (1, 2, '["cat", "bird"]'),
        (1, 3, '["cat", "bear", "tiger"]'),
        (1, 4, '["cat", "tiger"]'),
        (2, 5, '["cat", "tiger"]'),
        (2, 6, '["cat", "bear", "tiger"]'),
        (2, 7, '["cat", "bird"]'),
        (3, 8, '["tiger"]'),
        (3, 9, '["cat", "bird"]')
)
select *
    ,lead(col1) over (partition by id order by _order ) as lead_col1
from data
order by _order;
ID _ORDER COL1 LEAD_COL1
1 1 [ "cat", "dog", "bird" ] [ "cat", "bird" ]
1 2 [ "cat", "bird" ] [ "cat", "bear", "tiger" ]
1 3 [ "cat", "bear", "tiger" ] [ "cat", "tiger" ]
1 4 [ "cat", "tiger" ] null
2 5 [ "cat", "tiger" ] [ "cat", "bear", "tiger" ]
2 6 [ "cat", "bear", "tiger" ] [ "cat", "bird" ]
2 7 [ "cat", "bird" ]
3 8 [ "tiger" ] [ "cat", "bird" ]
3 9 [ "cat", "bird" ] null

Anyways: here's the code how I would do it:

with data(id, col1) as (
    select column1, parse_json(column2)
    from values
        (1, '["cat", "dog", "bird"]'),
        (1, '["cat", "bird"]'),
        (1, '["cat", "bear", "tiger"]'),
        (1, '["cat", "tiger"]'),
        (2, '["cat", "tiger"]'),
        (2, '["cat", "bear", "tiger"]'),
        (2, '["cat", "bird"]'),
        (3, '["tiger"]'),
        (3, '["cat", "bird"]')
), flatten as (
    select
        d.id, 
        dense_rank() over(order by f.seq) as seq, 
        f.index, 
        f.value as val
    from data as d,
    table(flatten(input=>d.col1)) f
)
select 
    a.id
    ,a.seq
    ,array_agg(a.val) within group (order by a.index) as col1
    ,array_agg(nvl2(a.val, nvl2(b.val, null, a.val), b.val)) within group (order by a.index) as col2
    ,array_size(col2 )as col3
from flatten as a
full outer join flatten as b 
    on a.id = b.id and a.seq + 1 = b.seq and a.val = b.val
where a.id is not null
group by a.id, a.seq
order by a.seq;
    
ID SEQ COL1 COL2 COL3
1 1 [ "cat", "dog", "bird" ] [ "dog" ] 1
1 2 [ "cat", "bird" ] [ "bird" ] 1
1 3 [ "cat", "bear", "tiger" ] [ "bear" ] 1
1 4 [ "cat", "tiger" ] [ "cat", "tiger" ] 2
2 5 [ "cat", "tiger" ] [] 0
2 6 [ "cat", "bear", "tiger" ] [ "bear", "tiger" ] 2
2 7 [ "cat", "bird" ] [ "cat", "bird" ] 2
3 8 [ "tiger" ] [ "tiger" ] 1
3 9 [ "cat", "bird" ] [ "cat", "bird" ] 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM