[英]PostgreSQL query runs faster with index scan, but engine chooses hash join
[英]PostgreSQL chooses wrong multicolumn index for ordered query
我正在使用 PostgreSQL 13.4。
我想執行以下查詢以接收一組中每個廣告系列的最新效果的日期:
SELECT DISTINCT ON (campaignid)
campaignid,
created
FROM
effects
WHERE
campaignid IN(1, 2, 3) -- the condition. may include different values
ORDER BY
campaignid,
created DESC;
我在那個表上也有幾個索引:
CREATE INDEX effects_campaignid_created_desc_idx ON effects (campaignid, created DESC);
CREATE INDEX effects_created_idx ON effects (created);
通常當我執行查詢時,會使用索引effects_campaignid_created_desc_idx
。 這是執行計划:
Unique (cost=26172.58..26406.28 rows=5 width=16) (actual time=710.816..712.315 rows=2 loops=1)
Buffers: shared hit=2480 read=2792
-> Sort (cost=26172.58..26289.43 rows=46739 width=16) (actual time=710.814..711.355 rows=12200 loops=1)
Sort Key: campaignid, created DESC
Sort Method: quicksort Memory: 956kB
Buffers: shared hit=2480 read=2792
-> Index Only Scan using effects_campaign_created_desc_idx on effects (cost=0.57..22547.42 rows=46739 width=16) (actual time=0.954..706.329 rows=12200 loops=1)
" Index Cond: (campaignid = ANY ('{1,2,3}'::bigint[]))"
Heap Fetches: 9079
Buffers: shared hit=2474 read=2792
Planning:
Buffers: shared hit=145 read=14 dirtied=1
Planning Time: 0.682 ms
Execution Time: 712.417 ms
然而:
analyze
因為它永遠運行):Unique (cost=27736480.20..28359558.40 rows=5 width=16)
-> Sort (cost=27736480.20..28048019.30 rows=124615640 width=16)
Sort Key: campaignid, created DESC
-> Seq Scan on effects (cost=0.00..7329244.30 rows=124615640 width=16)
" Filter: (campaignid = ANY ('{1,2,3,7}'::bigint[]))"
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
WHERE campaignid = 7
,查詢計划器再次更改行為,這次選擇對索引effects_created_idx
進行反向掃描,這同樣效率不高。 這是執行計划:Unique (cost=0.57..8071964.32 rows=5 width=16)
-> Index Scan Backward using effects_created_idx on effects (cost=0.57..8071964.32 rows=124568901 width=16)
Filter: (campaignid = 7)
Planning:
Buffers: shared hit=5
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
據我所知,PostgreSQL的“知道”,大部分的effects
有campaignid=7
,這個理由選擇不專注於使用該指數campaignid
和選擇,而不是另一個索引或順序掃描。
有沒有辦法提示/說服 PostgreSQL 對這些查詢使用更有效的索引effects_campaignid_created_desc_idx
,而不管我選擇為哪個活動獲取結果?
為了有效地做到這一點,PostgreSQL 可能需要跳過掃描。 但它不知道如何做其中之一。 (人們正在努力實現它,但即使完成了,我也不知道它是否適用於 DISTINCT ON)。
在它自動工作之前,您可以使用 LATERAL 和 LIMIT 獲得有效的實現。
select * from
(values(1),(2),(3),(7)) f(campaignid)
cross join lateral
(select created from effects where f.campaignid=campaignid order by created desc limit 1)foo;
嘗試將 Campaignid campaignid IN(1, 2, 3)
替換為campaignid = ANY(1, 2, 3)
。 這使我的幾個查詢更快。
但是,我有campaignid IN (SELECT UNNEST($1::TEXT[]))
並且我用campaignid = ANY($1)
替換了它,所以如果你的查詢中的值是常量,它可能會有所不同。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.