繁体   English   中英

sql arrays,相对于先前事件的事件发生

[英]sql arrays, event occurrences with respect prior event

我有一个数组中的用户事件数据,如下所示,

单击此处获取数组中的数据

Column X 
["event A", "event B", "event C", "event D", "event E"]
["event A", "event D", "event N"]
["event C", "event E", "event P"]
["event C", "event E", "event Q"]

我试图查看,当特定事件发生时,之后发生的其他事件是什么以及它们的频率,如下面的上述示例数据,

点击这里输出

所以FLATTENARRAY_SLICEARRAY_SIZE是这里需要的主要工具。

CTE 只是为了伪造一个数据表,所以展平数组循环穿过我们别名a . 我们可以在这一点上进行子选择以查看下一层,但我们可以直接加入到结果中,所以我有。 因此,我们得到了数组的尾部并将其展平,现在我们有了我们的对,我们可以数数

WITH data AS (
    SELECT parse_json(column1) as array FROM VALUES
      ( '["event A", "event B", "event C", "event D", "event E"]' ),
      ( '["event A", "event D", "event N"]' ),
      ( '["event C", "event E", "event P"]' ),
      ( '["event C", "event E", "event Q"]' )
)
SELECT 
    a.value as e_s
    ,t.value as e_o
    ,count(*) as frequency
FROM data d,
    table(flatten(input=> d.array)) a,
    table(flatten(input=> array_slice(d.array, a.index+1, ARRAY_SIZE(d.array)))) t
GROUP BY 1,2
ORDER BY 1,2;

给出:

E_S E_O FREQUENCY
"event A"   "event B"   1
"event A"   "event C"   1
"event A"   "event D"   2
"event A"   "event E"   1
"event A"   "event N"   1
"event B"   "event C"   1
"event B"   "event D"   1
"event B"   "event E"   1
"event C"   "event D"   1
"event C"   "event E"   3
"event C"   "event P"   1
"event C"   "event Q"   1
"event D"   "event E"   1
"event D"   "event N"   1
"event E"   "event P"   1
"event E"   "event Q"   1

一个更长的版本,其中每个步骤更明确,一次一个是:

SELECT f.e_s,
    f.e_o,
    count(*) as frequency
FROM (    
    SELECT e.e_s,
        t.value as e_o
    FROM (

        SELECT
            d.array,
            a.value as e_s,
            array_slice(d.array, a.index+1, d.len) as tail
        FROM (
            SELECT array,
                ARRAY_SIZE(array) as len
            FROM data
        ) d, 
            TABLE(FLATTEN(input=> d.array)) a
    ) e,
        TABLE(FLATTEN(input=> e.tail)) t
) f
GROUP BY 1,2
ORDER BY 1,2;

我的速度不够快,无法击败 Simeon,但我们最终还是使用了不同的方法,所以我想选择最适合你的方法!

我将数组展平为 CTE 中的行,然后将 CTE 连接回自身,然后总结结果。

查询

with flat as (
    select *
    from test_table,
         table (flatten(test_table.col_x)) f
)
select
    a.value  as E_S,
    b.value  as E_O,
    count(1) as FREQUENCY
from flat a
         join flat b on a.seq = b.seq and a.INDEX < b.INDEX
group by a.value, b.value
order by a.value, b.value

完整示例

-- create sample table
create or replace transient table test_table
(
    col_x array
);

-- insert sample data
insert overwrite into test_table (col_x)
SELECT
    parse_json(column1)
FROM
VALUES ('["event A", "event B", "event C", "event D", "event E"]'),
       ('["event A", "event D", "event N"]'),
       ('["event C", "event E", "event P"]'),
       ('["event C", "event E", "event Q"]')
;

with flat as (
    select *
    from test_table,
         table (flatten(test_table.col_x)) f
)
select
    a.value  as E_S,
    b.value  as E_O,
    count(1) as FREQUENCY
from flat a
         join flat b on a.seq = b.seq and a.INDEX < b.INDEX
group by a.value, b.value
order by a.value, b.value
;

结果

+---------+---------+---------+
|E_S      |E_O      |FREQUENCY|
+---------+---------+---------+
|"event A"|"event B"|1        |
|"event A"|"event C"|1        |
|"event A"|"event D"|2        |
|"event A"|"event E"|1        |
|"event A"|"event N"|1        |
|"event B"|"event C"|1        |
|"event B"|"event D"|1        |
|"event B"|"event E"|1        |
|"event C"|"event D"|1        |
|"event C"|"event E"|3        |
|"event C"|"event P"|1        |
|"event C"|"event Q"|1        |
|"event D"|"event E"|1        |
|"event D"|"event N"|1        |
|"event E"|"event P"|1        |
|"event E"|"event Q"|1        |
+---------+---------+---------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM