[英]Postgres SQL Query With Nested Sub-Query Taking Too Long
我正在使用具有数百万行和以下列/设置的事务记录数据库:
订购日期 | 订单编号 | 客户ID | 产品 | 价格 | 总金额 |
---|---|---|---|---|---|
2018 年 2 月 30 日 | 在线56134 | 492512952 | 125582 | 50 | 50 |
20/05/2020 | 离线 14452 | 291312855 | 125582 | 50 | 82 |
20/05/2020 | 离线 14452 | 291312855 | 291824 | 32 | 82 |
15/07/2015 | 离线 29528 | 192501431 | 693012 | 71 | 71 |
2017 年 9 月 1 日 | 离线 53422 | 291367825 | 捐款 | 10 | 20 |
2017 年 9 月 1 日 | 离线 53422 | 291367825 | 214257 | 10 | 20 |
2016 年 11 月 16 日 | 在线63642 | NULL | 639102 | 53 | 53 |
2017 年 11 月 1 日 | 在线96458 | 891367243 | 船运 | 10 | 10 |
我想找出过去三年内进行过交易且从未进行过线下交易的所有客户的平均年支出。 我有一个对所有客户都运行得足够快的查询:
SELECT
(SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate
FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product
NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation'
AND customerid NOT LIKE '111222333%') AS "Total Sales - Returns"
)
/
(SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years')
AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND
product != 'Donation' AND customerid NOT LIKE '111222333%'
);
但是,我针对仅在线客户的解决方案包括低效的嵌套子查询,这大大减慢了我的查询速度:
SELECT
(SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate
FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product
NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation'
AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT customerid FROM sales WHERE
orderid NOT LIKE 'online%')) AS "Total Sales - Returns"
)
/
(SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years')
AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND
product != 'Donation' AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT
customerid FROM sales WHERE orderid NOT LIKE 'online%')
);
总的来说,我有很多类似的查询(例如平均交易数量、交易间隔时间、首次购买日期等)。 因此,我需要对仅在线客户应用类似的逻辑来处理许多查询,我还需要排除仅在线客户。 实际上,有三组查询,一组用于所有查询,一组用于仅在线查询,另一组不包括仅在线查询。
有人对我如何加快上述查询和其他仅在线客户查询的速度有什么建议吗?
如果我没听错的话,您可以通过以下查询获得仅具有在线销售的客户过去 3 年的平均年增长率:
select customerid, sum(total_amount) / 3 as avg_year_amount
from sale
where orderdate > current_date - interval '3 year'
group by customerid
having bool_and(orderid like 'online%')
如果您想要此类客户的总体平均值,您可以添加另一个级别的聚合:
select avg(avg_year_amount) as grand_avg
from (
select customerid, sum(total_amount) / 3 as avg_year_amount
from sale
where orderdate > current_date - interval '3 year'
group by customerid
having bool_and(orderid like 'online%')
) t
您的查询在问题中未描述的where
子句中有其他过滤器。 您可以根据需要将它们添加到子查询的where
子句中。
我想找出过去三年内进行过交易且从未进行过线下交易的所有客户的平均年支出。
我觉得这个解释并不完全清楚。 让我假设你想要:
注意:2.5 年前恰好有一笔 300 笔交易的客户将计为每年 100 笔(如果包括在内)。
然后:
select sum(total_amount) / (3 * count(*)) as yearly_average
from (select s.*,
bool_and(orderid like 'online%') over (partition by customerid) as always_online
from sales s
) s
where always_online and
orderdate > current_date - interval '3 year';
我猜
(SELECT customerid FROM sales WHERE orderid NOT LIKE 'online%')
对每一行重复评估,每次都返回相同的结果,浪费了很多时间。 如果子查询首先放入临时表中
WITH offcus (id) AS (
SELECT customerid FROM sales
WHERE orderid NOT LIKE 'online%')
SELECT ... AND customerid NOT IN (SELET id FROM offcustomer) ...
您的查询可能与您的“足够快”查询一样快,尽管我自己没有测试过。 EXPALIN 命令值得一试,因为它清楚地显示了查询是如何执行的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.