[英]Redshift Alternative for Correlated Sub-Query
我正在使用Redshift,并且需要用于相关子查询的替代方法。 我收到相关子查询不受支持的错误。 但是,对于尝试识别同一客户在原始交易开始的给定小时内进行的所有销售交易的特定练习,我不确定传统的左连接还是行得通的。 即,查询取决于父选择的上下文或当前值。 我也尝试过使用row_number()窗口函数进行类似操作,但再次需要一种在日期范围内窗口/分区的方法-而不仅仅是customer_id。
总体目标是找到给定客户ID的第一笔交易,然后查找在第一笔交易后60分钟内进行的所有后续交易。 对于同一客户(以及最终数据库中的所有客户)的其余交易,此逻辑将继续进行。 也就是说,一旦从第一次交易开始就建立了最初的60分钟窗口,则第二个60分钟窗口将在第一个60分钟窗口的结尾处开始,并且第二个窗口内的所有交易也将被识别并合并然后重复进行其余的交易。
输出将列出开始60分钟窗口的第一个交易ID,然后列出在60分钟窗口内进行的其他后续交易ID。 第二行将在接下来的60分钟窗口中显示同一客户进行的第一笔交易ID(同样,在第一个60分钟窗口之后的第一笔交易将是第二60分钟窗口的开始),随后的交易也将进行在第二个60分钟的窗口内。
最基本形式的查询示例类似于以下查询:
select
s1.customer_id,
s1.transaction_id,
s1.order_time,
(
select
s2.transaction_id
from
sales s2
where
s2.order_time > s1.order_time and
s2.order_time <= dateadd(m,60,s1.order_time) and
s2.customer_id = s1.customer_id
order by
s2.order_time asc
limit 1
) as sales_transaction_id_1,
(
select
s3.transaction_id
from
sales s3
where
s3.order_time > s1.order_time and
s3.order_time <= dateadd(m,60,s1.order_time) and
s3.customer_id = s1.customer_id
order by
s3.order_time asc
limit 1 offset 1
) as sales_transaction_id_2,
(
select
s3.transaction_id
from
sales s4
where
s4.order_time > s1.order_time and
s4.order_time <= dateadd(m,60,s1.order_time) and
s4.customer_id = s1.customer_id
order by
s4.order_time asc
limit 1 offset 1
) as sales_transaction_id_3
from
(
select
sales.customer_id,
sales.transaction_id,
sales.order_time
from
sales
order by
sales.order_time desc
) s1;
例如,如果客户进行了以下交易:
customer_id transaction_id order_time
1234 33453 2017-06-05 13:30
1234 88472 2017-06-05 13:45
1234 88477 2017-06-05 14:10
1234 99321 2017-06-07 8:30
1234 99345 2017-06-07 8:45
预期输出为:
customer_id transaction_id sales_transaction_id_1 sales_transaction_id_2 sales_transaction_id_3
1234 33453 88472 88477 NULL
1234 99321 99345 NULL NULL
而且,Redshift似乎不支持横向连接,这似乎进一步限制了我可以选择的选项。 任何帮助将不胜感激。
根据您的描述,您只希望group by
和某种日期差异。 我不确定您要如何合并行,但这是基本思想:
select s.customer_id,
min(order_time) as first_order_in_hour,
max(order_time) as last_order_in_hour,
count(*) as num_orders
from (select s.*,
min(order_time) over (partition by customer_id) as min_ot
from sales s
) s
group by customer_id, floor(datediff(second, min_ot, order_time) / (60 * 60));
在Postgres中,这种形式(或类似的东西,因为Postgres没有datediff()
)也将更快。
您可以使用窗口函数来获取每个事务的后续事务。 该窗口将是“客户/小时”,您可以对记录进行排名以获取第一个“锚定”交易并获取所需的所有后续交易:
with
transaction_chains as (
select
customer_id
,transaction_id
,order_time
-- rank transactions within window to find the first "anchor" transaction
,row_number() over (partition by customer_id,date_trunc('minute',order_time) order by order_time)
-- 1st next order
,lead(transaction_id,1) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_1
,lead(order_time,1) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_1
-- 2nd next order
,lead(transaction_id,2) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_2
,lead(order_time,2) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_2
-- 2nd next order
,lead(transaction_id,3) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as transaction_id_3
,lead(order_time,3) over (partition by customer_id,date_trunc('minute',order_time) order by order_time) as order_time_3
from sales
)
select
customer_id
,transaction_id
,transaction_id_1
,transaction_id_2
,transaction_id_3
from transaction_chains
where row_number=1;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.