簡體   English   中英

PostgreSQL通過查詢優化來區分+順序

[英]PostgreSQL distinct on + order by query optimization

我在這里有一個查詢的小問題。

SELECT DISTINCT ON ("reporting_processedamazonsnapshot"."offer_id") *
FROM "reporting_processedamazonsnapshot" INNER JOIN 
     "offers_boooffer"
        ON ("reporting_processedamazonsnapshot"."offer_id" =
            "offers_boooffer"."id") INNER JOIN
     "offers_offersettings"
        ON ("offers_boooffer"."id" = "offers_offersettings"."offer_id")
WHERE "offers_offersettings"."account_id" = 20
ORDER BY "reporting_processedamazonsnapshot"."offer_id" ASC,
         "reporting_processedamazonsnapshot"."scraping_date" DESC

我在offer_id ASC, scraping_date DESC上有一個名為latest_scraping的索引,在offer_id ASC, scraping_date DESC上有一個索引offer_id ASC, scraping_date DESC但是由於某種原因,PostgreSQL在使用該索引后仍會進行排序,從而導致巨大的性能問題。

我不明白為什么它不使用已排序的數據而不是重做排序。 我的索引不對嗎? 還是應該嘗試以其他方式進行查詢?

這是解釋 在此處輸入圖片說明 及其實際數據

'Unique  (cost=21260.47..21263.06 rows=519 width=1288) (actual time=38053.685..38177.348 rows=1783 loops=1)'
'  ->  Sort  (cost=21260.47..21261.76 rows=519 width=1288) (actual time=38053.683..38161.478 rows=153095 loops=1)'
'        Sort Key: reporting_processedamazonsnapshot.offer_id, reporting_processedamazonsnapshot.scraping_date DESC'
'        Sort Method: external merge  Disk: 162088kB'
'        ->  Nested Loop  (cost=41.90..21237.06 rows=519 width=1288) (actual time=70.874..36148.348 rows=153095 loops=1)'
'              ->  Nested Loop  (cost=41.47..17547.90 rows=1627 width=8) (actual time=54.287..126.740 rows=1784 loops=1)'
'                    ->  Bitmap Heap Scan on offers_offersettings  (cost=41.04..4823.48 rows=1627 width=4) (actual time=52.532..84.102 rows=1784 loops=1)'
'                          Recheck Cond: (account_id = 20)'
'                          Heap Blocks: exact=38'
'                          ->  Bitmap Index Scan on offers_offersettings_account_id_fff7a8c0  (cost=0.00..40.63 rows=1627 width=0) (actual time=49.886..49.886 rows=4132 loops=1)'
'                                Index Cond: (account_id = 20)'
'                    ->  Index Only Scan using offers_boooffer_pkey on offers_boooffer  (cost=0.43..7.81 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1784)'
'                          Index Cond: (id = offers_offersettings.offer_id)'
'                          Heap Fetches: 1784'
'              ->  Index Scan using latest_scraping on reporting_processedamazonsnapshot  (cost=0.43..1.69 rows=58 width=1288) (actual time=0.526..20.146 rows=86 loops=1784)'
'                    Index Cond: (offer_id = offers_boooffer.id)'
'Planning time: 187.133 ms'
'Execution time: 38195.266 ms'

要使用索引來避免排序,PostgreSQL首先必須按索引順序掃描所有 "reporting_processedamazonsnapshot" ,然后使用嵌套循環連接 (以便保留順序)將所有 "offers_boooffer" 連接起來 ,然后再將所有 "offers_offersettings" ,再次使用嵌套循環連接

最后,所有不符合條件"offers_offersettings"."account_id" = 20將被丟棄。

PostgreSQL正確地認為(我認為),使用條件盡可能多地減少行數,然后使用最有效的join方法聯接表,然后對DISTINCT子句進行排序,這樣效率更高。

我想知道以下查詢是否可能更快:

SELECT DISTINCT ON (q.offer_id) *
FROM offers_offersettings ofs
   JOIN offers_boooffer bo ON bo.id = ofs.offer_id
   CROSS JOIN LATERAL
      (SELECT *
       FROM reporting_processedamazonsnapshot r
       WHERE r.offer_id = bo.offer_id
       ORDER BY r.scraping_date DESC
       LIMIT 1) q
WHERE ofs.account_id = 20
ORDER BY q.offer_id ASC, q.scraping_date DESC;

執行計划將是類似的,除了必須從索引中掃描更少的行,這將減少最需要的執行時間。

如果您想加快排序速度,請將該查詢的work_mem增加到500MB(如果可以承受的話)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM