[英]how to find difference between following rows in an array in SQL snowflake?
我正在尝试比较同一列中以下行(分组依据)的两个 arrays 并返回不同列中的数组和总数。 我想在此链接上提到类似的结果比较两个 arrays 并计算相同字符串的数量。 但是在这里 arrays 在列之间进行比较,但我想在行之间进行比较。
所以,我的初始数据集可能看起来像这样
|Id. | column 1 |
|----------|---------------------|
|1 | [cat, dog, bird] |
|1 |. [cat, bird] |
|1 | [cat, bear, tiger]|
|1 |. [cat, tiger] |
|2 | [cat, tiger] |
|2 | [cat, bear, tiger]|
|2 | [cat, bird] |
|3 |. [tiger] |
|3 |. [cat, bird] |
所以,我的最终数据集可能看起来像这样。
|Id. | column 1 | column 2 | column 3 |
|----------|---------------------|------------------|------------------|
|1 | [cat, dog, bird] | [dog] | 1 |
|1 |. [cat, bird] | [bird] | 1 |
|1 | [cat, bear, tiger] | [bear] | 1 |
|1 | [cat, tiger] |. [cat, tiger ] | |
|2 | [cat, tiger] | | 0 |
|2 | [cat, bear, tiger] | [bear, tiger]. | 2 |
|2 | [cat, bird] | [cat, bird] | |
|3 | [tiger] | [tiger] | 1 |
|3 | [cat, bird] | [cat, bird | |
第 2 列包含有关第一个数组但不属于第二个数组的信息,第 3 列包含有关第 2 列中有多少元素的信息。
谢谢你
所以 LEAD 示例:
with data(id, _order, col1) as (
select column1, column2, parse_json(column3)
from values
(1, 1, '["cat", "dog", "bird"]'),
(1, 2, '["cat", "bird"]'),
(1, 3, '["cat", "bear", "tiger"]'),
(1, 4, '["cat", "tiger"]'),
(2, 5, '["cat", "tiger"]'),
(2, 6, '["cat", "bear", "tiger"]'),
(2, 7, '["cat", "bird"]'),
(3, 8, '["tiger"]'),
(3, 9, '["cat", "bird"]')
)
select *
,lead(col1) over (partition by id order by _order ) as lead_col1
from data
order by _order;
ID | _命令 | COL1 | LEAD_COL1 |
---|---|---|---|
1 | 1 | [“猫”、“狗”、“鸟”] | [“猫”,“鸟”] |
1 | 2 | [“猫”,“鸟”] | [“猫”、“熊”、“老虎”] |
1 | 3 | [“猫”、“熊”、“老虎”] | [“猫”,“老虎”] |
1 | 4 | [“猫”,“老虎”] | null |
2 | 5 | [“猫”,“老虎”] | [“猫”、“熊”、“老虎”] |
2 | 6 | [“猫”、“熊”、“老虎”] | [“猫”,“鸟”] |
2 | 7 | [“猫”,“鸟”] | |
3 | 8 | [ “老虎” ] | [“猫”,“鸟”] |
3 | 9 | [“猫”,“鸟”] | null |
无论如何:这是我将如何做的代码:
with data(id, col1) as (
select column1, parse_json(column2)
from values
(1, '["cat", "dog", "bird"]'),
(1, '["cat", "bird"]'),
(1, '["cat", "bear", "tiger"]'),
(1, '["cat", "tiger"]'),
(2, '["cat", "tiger"]'),
(2, '["cat", "bear", "tiger"]'),
(2, '["cat", "bird"]'),
(3, '["tiger"]'),
(3, '["cat", "bird"]')
), flatten as (
select
d.id,
dense_rank() over(order by f.seq) as seq,
f.index,
f.value as val
from data as d,
table(flatten(input=>d.col1)) f
)
select
a.id
,a.seq
,array_agg(a.val) within group (order by a.index) as col1
,array_agg(nvl2(a.val, nvl2(b.val, null, a.val), b.val)) within group (order by a.index) as col2
,array_size(col2 )as col3
from flatten as a
full outer join flatten as b
on a.id = b.id and a.seq + 1 = b.seq and a.val = b.val
where a.id is not null
group by a.id, a.seq
order by a.seq;
ID | 序列 | COL1 | COL2 | COL3 |
---|---|---|---|---|
1 | 1 | [“猫”、“狗”、“鸟”] | [ “狗” ] | 1 |
1 | 2 | [“猫”,“鸟”] | [ “鸟” ] | 1 |
1 | 3 | [“猫”、“熊”、“老虎”] | [ “熊” ] | 1 |
1 | 4 | [“猫”,“老虎”] | [“猫”,“老虎”] | 2 |
2 | 5 | [“猫”,“老虎”] | [] | 0 |
2 | 6 | [“猫”、“熊”、“老虎”] | [“熊”,“老虎”] | 2 |
2 | 7 | [“猫”,“鸟”] | [“猫”,“鸟”] | 2 |
3 | 8 | [ “老虎” ] | [ “老虎” ] | 1 |
3 | 9 | [“猫”,“鸟”] | [“猫”,“鸟”] | 2 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.