[英]Using ARRAY_AGG() with DISTINCT and ORDER BY with ORDINAL
我有一些我正在尝试汇总的数据(这里大大简化了)。 原始数据使用类似于以下的架构:
UserID - STRING
A - RECORD REPEATED
A.Action - STRING
A.Visit - INTEGER
A.Order - INTEGER
MISC - RECORD REPEATED
( other columns omitted here )
由于“MISC”列,有许多实际记录,但我只想关注上面显示的前 5 列。 下面显示了原始数据的示例(请注意,显示的值只是一个示例,存在许多其他值,因此无法将这些值硬编码到查询中):
表 0:(原始数据样本)
(UserID 下的空值如 BiqQuery 所示 - “A”字段是嵌套记录的一部分)
我的查询生成如下表 1所示的数据。 我正在尝试将 ARRAY_AGG 与 ORDINAL 一起使用,以便为每个用户仅选择前两个“操作”,并按表 2 所示进行重组。
SELECT
UserId, ARRAY_AGG( STRUCT(A.Action, A.Visit, A.Order)
ORDER BY A.Visit, A.Order, A.Action )
FROM
`table`
LEFT JOIN UNNEST(A) AS A
GROUP BY
UserId
表 1:(上述查询的示例输出)
表2:(需要的格式)
所以我需要:
我尝试的查询策略是使用以下内容按用户 ID、访问、订购和获取操作的 DISTINCT 值进行排序:
UserId,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) FirstAction,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) SecondAction
但是,该方法会产生以下错误:
错误:同时具有 DISTINCT 和 ORDER BY 参数的聚合函数只能作为函数参数的 ORDER BY 列
关于如何纠正这个错误的任何想法(或替代方法?)
不确定为什么原始查询有DISTINCT
,如果表 2 中显示的结果不需要重复数据删除。
说:
#standardSQL
WITH sample AS (
SELECT actor.login userid, type action
, EXTRACT(HOUR FROM created_at) visit
, EXTRACT(MINUTE FROM created_at) `order`
FROM `githubarchive.day.20171005`
)
SELECT userid, actions[OFFSET(0)] firstaction, actions[SAFE_OFFSET(1)] secondaction
FROM (
SELECT userid, ARRAY_AGG(action ORDER BY visit, `order` LIMIT 2) actions
FROM sample
GROUP BY 1
ORDER BY 1
LIMIT 100
)
下面试试。
#standardSQL
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `project.dataset.table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
-- ORDER BY UserId
您可以使用问题中的虚拟数据测试/玩它
#standardSQL
WITH `table` AS (
SELECT 'U001' AS UserId, [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Upgrade', 1, 2),('Feedback', 1, 3),('Share', 1, 4),('Share', 2, 1)] AS A UNION ALL
SELECT 'U002', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Share', 7, 1),('Share', 7, 2),('Refer', 8, 1),('Feedback', 8, 2),('Feedback', 8, 3)] UNION ALL
SELECT 'U003', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Share', 1, 2),('Share', 1, 3),('Share', 2, 1),('Share', 2, 2),('Share', 3, 1),('Share', 3, 2)]
)
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
ORDER BY UserId
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.