[英]Postgres SQL Query With Nested Sub-Query Taking Too Long
我正在使用具有數百萬行和以下列/設置的事務記錄數據庫:
訂購日期 | 訂單編號 | 客戶ID | 產品 | 價格 | 總金額 |
---|---|---|---|---|---|
2018 年 2 月 30 日 | 在線56134 | 492512952 | 125582 | 50 | 50 |
20/05/2020 | 離線 14452 | 291312855 | 125582 | 50 | 82 |
20/05/2020 | 離線 14452 | 291312855 | 291824 | 32 | 82 |
15/07/2015 | 離線 29528 | 192501431 | 693012 | 71 | 71 |
2017 年 9 月 1 日 | 離線 53422 | 291367825 | 捐款 | 10 | 20 |
2017 年 9 月 1 日 | 離線 53422 | 291367825 | 214257 | 10 | 20 |
2016 年 11 月 16 日 | 在線63642 | NULL | 639102 | 53 | 53 |
2017 年 11 月 1 日 | 在線96458 | 891367243 | 船運 | 10 | 10 |
我想找出過去三年內進行過交易且從未進行過線下交易的所有客戶的平均年支出。 我有一個對所有客戶都運行得足夠快的查詢:
SELECT
(SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate
FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product
NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation'
AND customerid NOT LIKE '111222333%') AS "Total Sales - Returns"
)
/
(SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years')
AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND
product != 'Donation' AND customerid NOT LIKE '111222333%'
);
但是,我針對僅在線客戶的解決方案包括低效的嵌套子查詢,這大大減慢了我的查詢速度:
SELECT
(SELECT SUM(CAST(total_amount AS NUMERIC)) FROM (SELECT DISTINCT orderid, total_amount, orderdate
FROM sales WHERE orderdate > (NOW() - INTERVAL '12 month') AND customerid IS NOT NULL AND product
NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND product != 'Donation'
AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT customerid FROM sales WHERE
orderid NOT LIKE 'online%')) AS "Total Sales - Returns"
)
/
(SELECT COUNT(DISTINCT customerid) FROM sales WHERE orderdate BETWEEN (NOW() - INTERVAL '3 years')
AND NOW() AND product NOT LIKE 'SHIP%' AND product NOT LIKE 'Ship%' AND product != 'DONATION' AND
product != 'Donation' AND customerid NOT LIKE '111222333%' AND customerid NOT IN (SELECT
customerid FROM sales WHERE orderid NOT LIKE 'online%')
);
總的來說,我有很多類似的查詢(例如平均交易數量、交易間隔時間、首次購買日期等)。 因此,我需要對僅在線客戶應用類似的邏輯來處理許多查詢,我還需要排除僅在線客戶。 實際上,有三組查詢,一組用於所有查詢,一組用於僅在線查詢,另一組不包括僅在線查詢。
有人對我如何加快上述查詢和其他僅在線客戶查詢的速度有什么建議嗎?
如果我沒聽錯的話,您可以通過以下查詢獲得僅具有在線銷售的客戶過去 3 年的平均年增長率:
select customerid, sum(total_amount) / 3 as avg_year_amount
from sale
where orderdate > current_date - interval '3 year'
group by customerid
having bool_and(orderid like 'online%')
如果您想要此類客戶的總體平均值,您可以添加另一個級別的聚合:
select avg(avg_year_amount) as grand_avg
from (
select customerid, sum(total_amount) / 3 as avg_year_amount
from sale
where orderdate > current_date - interval '3 year'
group by customerid
having bool_and(orderid like 'online%')
) t
您的查詢在問題中未描述的where
子句中有其他過濾器。 您可以根據需要將它們添加到子查詢的where
子句中。
我想找出過去三年內進行過交易且從未進行過線下交易的所有客戶的平均年支出。
我覺得這個解釋並不完全清楚。 讓我假設你想要:
注意:2.5 年前恰好有一筆 300 筆交易的客戶將計為每年 100 筆(如果包括在內)。
然后:
select sum(total_amount) / (3 * count(*)) as yearly_average
from (select s.*,
bool_and(orderid like 'online%') over (partition by customerid) as always_online
from sales s
) s
where always_online and
orderdate > current_date - interval '3 year';
我猜
(SELECT customerid FROM sales WHERE orderid NOT LIKE 'online%')
對每一行重復評估,每次都返回相同的結果,浪費了很多時間。 如果子查詢首先放入臨時表中
WITH offcus (id) AS (
SELECT customerid FROM sales
WHERE orderid NOT LIKE 'online%')
SELECT ... AND customerid NOT IN (SELET id FROM offcustomer) ...
您的查詢可能與您的“足夠快”查詢一樣快,盡管我自己沒有測試過。 EXPALIN 命令值得一試,因為它清楚地顯示了查詢是如何執行的。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.