简体   繁体   English

Postgres SQL 查询嵌套子查询耗时过长

[英]Postgres SQL Query With Nested Sub-Query Taking Too Long

I am working with a transaction records database with many millions of rows and the following columns / setup:我正在使用具有数百万行和以下列/设置的事务记录数据库:

Orderdate订购日期 OrderID订单编号 CustomerId客户ID Product产品 Price价格 Total_Amount总金额
30/02/2018 2018 年 2 月 30 日 online-56134在线56134 492512952 492512952 125582 125582 50 50 50 50
20/05/2020 20/05/2020 offline-14452离线 14452 291312855 291312855 125582 125582 50 50 82 82
20/05/2020 20/05/2020 offline-14452离线 14452 291312855 291312855 291824 291824 32 32 82 82
15/07/2015 15/07/2015 offline-29528离线 29528 192501431 192501431 693012 693012 71 71 71 71
09/01/2017 2017 年 9 月 1 日 offline-53422离线 53422 291367825 291367825 Donation捐款 10 10 20 20
09/01/2017 2017 年 9 月 1 日 offline-53422离线 53422 291367825 291367825 214257 214257 10 10 20 20
16/11/2016 2016 年 11 月 16 日 online-63642在线63642 NULL NULL 639102 639102 53 53 53 53
11/01/2017 2017 年 11 月 1 日 online-96458在线96458 891367243 891367243 Shipping船运 10 10 10 10

I want to find out the average annual spend of all customers who have transacted in the past three years, and have never transacted offline.我想找出过去三年内进行过交易且从未进行过线下交易的所有客户的平均年支出。 I have a query which runs fast enough for all customers:我有一个对所有客户都运行得足够快的查询:

    SELECT
       (SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate 
        FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product 
        NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation' 
        AND customerid NOT LIKE '111222333%') AS "Total Sales - Returns"
       )
    /
       (SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years') 
        AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND 
        product != 'Donation' AND customerid NOT LIKE '111222333%'
       );

However, my solution for online-only customers includes inefficient nested subqueries, which are slowing my query down significantly:但是,我针对仅在线客户的解决方案包括低效的嵌套子查询,这大大减慢了我的查询速度:

    SELECT
       (SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate 
        FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product 
        NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation' 
        AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT customerid FROM sales WHERE 
        orderid NOT LIKE 'online%')) AS "Total Sales - Returns"
       )
    /
       (SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years') 
        AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND 
        product != 'Donation' AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT 
        customerid FROM sales WHERE orderid NOT LIKE 'online%')
       );

Overall, I have many similar queries (such as some for average transaction quantity, time between transactions, first purchase date and more).总的来说,我有很多类似的查询(例如平均交易数量、交易间隔时间、首次购买日期等)。 Thus, I need to apply a similar logic for online-only customers to many queries, I also need to exclude online-only customers.因此,我需要对仅在线客户应用类似的逻辑来处理许多查询,我还需要排除仅在线客户。 Indeed, there are three sets of queries, one for all, one for online-only, and one which excludes online-only.实际上,有三组查询,一组用于所有查询,一组用于仅在线查询,另一组不包括仅在线查询。

Does anyone have advice on how I can speed up the above query and other online-only customer queries up significantly?有人对我如何加快上述查询和其他仅在线客户查询的速度有什么建议吗?

If I follow you correctly, you can get the average yearly annual of the last 3 years for customers that only had online sales with the following query:如果我没听错的话,您可以通过以下查询获得仅具有在线销售的客户过去 3 年的平均年增长率:

select customerid, sum(total_amount) / 3 as avg_year_amount
from sale
where orderdate > current_date - interval '3 year'
group by customerid
having bool_and(orderid like 'online%')

If you want the overall average of such customers, you can add another level of aggregation:如果您想要此类客户的总体平均值,您可以添加另一个级别的聚合:

select avg(avg_year_amount) as grand_avg
from (
    select customerid, sum(total_amount) / 3 as avg_year_amount
    from sale
    where orderdate > current_date - interval '3 year'
    group by customerid
    having bool_and(orderid like 'online%')
) t

Your query has additional filters in the where clauses that are not described in the question.您的查询在问题中未描述的where子句中有其他过滤器。 You can add them to the where clause of the subquery as needed.您可以根据需要将它们添加到子查询的where子句中。

I want to find out the average annual spend of all customers who have transacted in the past three years, and have never transacted offline.我想找出过去三年内进行过交易且从未进行过线下交易的所有客户的平均年支出。

I don't find this explanation totally clear.我觉得这个解释并不完全清楚。 Let me assume that you want:让我假设你想要:

  • Any customer who has had any transaction in the past three years.在过去三年内进行过任何交易的任何客户。
  • The total spend over three years divided by 3.三年的总支出除以 3。
  • Was always online, during the three years and before.一直在线,在三年及之前。

Note: A customer who has exactly one transaction of 300 2.5 years ago would count as 100 per year (if included).注意:2.5 年前恰好有一笔 300 笔交易的客户将计为每年 100 笔(如果包括在内)。

Then:然后:

select sum(total_amount) / (3 * count(*)) as yearly_average
from (select s.*,
             bool_and(orderid like 'online%') over (partition by customerid) as always_online
      from sales s
     ) s
where always_online and
      orderdate > current_date - interval '3 year';

I guess我猜

(SELECT customerid FROM sales WHERE orderid NOT LIKE 'online%')

is evaluated repeatedly for every row, returning the same result every time and wasting so much time.对每一行重复评估,每次都返回相同的结果,浪费了很多时间。 If the subquery is first put into the temporary table as如果子查询首先放入临时表中

WITH offcus (id) AS (
  SELECT customerid FROM sales
         WHERE orderid NOT LIKE 'online%')
SELECT ... AND customerid NOT IN (SELET id FROM offcustomer) ...

your query may be as fast as your "fast enough" query, though not tested myself.您的查询可能与您的“足够快”查询一样快,尽管我自己没有测试过。 EXPALIN command is worth of try as it gives clear view of how queries are executed. EXPALIN 命令值得一试,因为它清楚地显示了查询是如何执行的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM