繁体   English   中英

如何在 SQL 雪花中找到数组中以下行之间的差异?

[英]how to find difference between following rows in an array in SQL snowflake?

我正在尝试比较同一列中以下行(分组依据)的两个 arrays 并返回不同列中的数组和总数。 我想在此链接上提到类似的结果比较两个 arrays 并计算相同字符串的数量 但是在这里 arrays 在列之间进行比较,但我想在行之间进行比较。

所以,我的初始数据集可能看起来像这样

|Id.       |      column 1       | 
|----------|---------------------|
|1         |   [cat, dog, bird]  |
|1         |.  [cat, bird]       |
|1         |   [cat, bear, tiger]| 
|1         |.  [cat, tiger]      |
|2         |   [cat, tiger]      |
|2         |   [cat, bear, tiger]|  
|2         |   [cat, bird]       |
|3         |.  [tiger]           |
|3         |.  [cat, bird]       |

所以,我的最终数据集可能看起来像这样。

|Id.       |      column 1       |     column 2     |     column 3     |
|----------|---------------------|------------------|------------------|
|1         |  [cat, dog, bird]   |  [dog]           |          1       |
|1         |. [cat, bird]        |  [bird]          |          1       |
|1         |  [cat, bear, tiger] |  [bear]          |          1       |
|1         |  [cat, tiger]       |. [cat, tiger ]   |                  |
|2         |  [cat, tiger]       |                  |          0       |
|2         |  [cat, bear, tiger] |  [bear, tiger].  |          2       |    
|2         |  [cat, bird]        |  [cat, bird]     |                  |
|3         |  [tiger]            |  [tiger]         |          1       |
|3         |  [cat, bird]        |  [cat, bird      |                  |

第 2 列包含有关第一个数组但不属于第二个数组的信息,第 3 列包含有关第 2 列中有多少元素的信息。

谢谢你

所以 LEAD 示例:

with data(id, _order, col1) as (
    select column1, column2, parse_json(column3)
    from values
        (1, 1, '["cat", "dog", "bird"]'),
        (1, 2, '["cat", "bird"]'),
        (1, 3, '["cat", "bear", "tiger"]'),
        (1, 4, '["cat", "tiger"]'),
        (2, 5, '["cat", "tiger"]'),
        (2, 6, '["cat", "bear", "tiger"]'),
        (2, 7, '["cat", "bird"]'),
        (3, 8, '["tiger"]'),
        (3, 9, '["cat", "bird"]')
)
select *
    ,lead(col1) over (partition by id order by _order ) as lead_col1
from data
order by _order;
ID _命令 COL1 LEAD_COL1
1 1 [“猫”、“狗”、“鸟”] [“猫”,“鸟”]
1 2 [“猫”,“鸟”] [“猫”、“熊”、“老虎”]
1 3 [“猫”、“熊”、“老虎”] [“猫”,“老虎”]
1 4 [“猫”,“老虎”] null
2 5 [“猫”,“老虎”] [“猫”、“熊”、“老虎”]
2 6 [“猫”、“熊”、“老虎”] [“猫”,“鸟”]
2 7 [“猫”,“鸟”]
3 8 [ “老虎” ] [“猫”,“鸟”]
3 9 [“猫”,“鸟”] null

无论如何:这是我将如何做的代码:

with data(id, col1) as (
    select column1, parse_json(column2)
    from values
        (1, '["cat", "dog", "bird"]'),
        (1, '["cat", "bird"]'),
        (1, '["cat", "bear", "tiger"]'),
        (1, '["cat", "tiger"]'),
        (2, '["cat", "tiger"]'),
        (2, '["cat", "bear", "tiger"]'),
        (2, '["cat", "bird"]'),
        (3, '["tiger"]'),
        (3, '["cat", "bird"]')
), flatten as (
    select
        d.id, 
        dense_rank() over(order by f.seq) as seq, 
        f.index, 
        f.value as val
    from data as d,
    table(flatten(input=>d.col1)) f
)
select 
    a.id
    ,a.seq
    ,array_agg(a.val) within group (order by a.index) as col1
    ,array_agg(nvl2(a.val, nvl2(b.val, null, a.val), b.val)) within group (order by a.index) as col2
    ,array_size(col2 )as col3
from flatten as a
full outer join flatten as b 
    on a.id = b.id and a.seq + 1 = b.seq and a.val = b.val
where a.id is not null
group by a.id, a.seq
order by a.seq;
    
ID 序列 COL1 COL2 COL3
1 1 [“猫”、“狗”、“鸟”] [ “狗” ] 1
1 2 [“猫”,“鸟”] [ “鸟” ] 1
1 3 [“猫”、“熊”、“老虎”] [ “熊” ] 1
1 4 [“猫”,“老虎”] [“猫”,“老虎”] 2
2 5 [“猫”,“老虎”] [] 0
2 6 [“猫”、“熊”、“老虎”] [“熊”,“老虎”] 2
2 7 [“猫”,“鸟”] [“猫”,“鸟”] 2
3 8 [ “老虎” ] [ “老虎” ] 1
3 9 [“猫”,“鸟”] [“猫”,“鸟”] 2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM