Amazon RDS - Postgres 未對 SELECT 查詢使用索引

Question

我有一種感覺，我做錯了什么，但我似乎無法弄清楚。

我有以下要嘗試執行的查詢：

Select col1, col2, col3, col4, col5, day, month, year,
       sum(num1) as sum_num1, 
       sum(num2) as sum_num2,
       count(*) as count_items
from test_table where day = 10 and month = 5 and year = 2020
group by col1, col2, col3, col4, col5, day, month, year;

另外，我有一個使用以下命令設置的day, month, year索引

CREATE INDEX CONCURRENTLY testtable_dmy_idx on test_table (day, month, year);

現在我想出了設置順序掃描開/關的設置，並嘗試使用查詢。

因此，當使用SET enable_seqscan TO on; （順便說一句，這是默認行為）和EXPLAIN (analyze,buffers,timing) ，我得到以下 output：

-- Select Query with Sequential scan on 

QUERY PLAN
Finalize GroupAggregate  (cost=9733303.39..10836008.34 rows=5102790 width=89) (actual time=1100914.091..1110820.480 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=25020 read=2793049 dirtied=10040, temp read=74932 written=75039"
  I/O Timings: read=1059425.134
  ->  Gather Merge  (cost=9733303.39..10607468.38 rows=6454984 width=89) (actual time=1100911.426..1110193.876 rows=795097 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
        I/O Timings: read=3178066.529
        ->  Partial GroupAggregate  (cost=9732303.36..9861403.04 rows=3227492 width=89) (actual time=1100791.915..1107668.495 rows=265032 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
              I/O Timings: read=3178066.529
              ->  Sort  (cost=9732303.36..9740372.09 rows=3227492 width=81) (actual time=1100788.479..1105630.411 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 241320kB
                    Worker 0:  Sort Method: external merge  Disk: 246776kB
                    Worker 1:  Sort Method: external merge  Disk: 246336kB
"                    Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
                    I/O Timings: read=3178066.529
                    ->  Parallel Seq Scan on test_table  (cost=0.00..9074497.49 rows=3227492 width=81) (actual time=656277.982..1073808.146 rows=2630708 loops=3)
                          Filter: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Filter: 24027044
                          Buffers: shared hit=76855 read=8416561 dirtied=33686
                          I/O Timings: read=3178066.180
Planning Time: 4.017 ms
Execution Time: 1111033.041 ms
Total time - Around 18 minutes

然后當我將SET enable_seqscan TO off; 並使用解釋運行相同的查詢，我得到以下信息：

-- Select Query with Sequential scan off

QUERY PLAN
Finalize GroupAggregate  (cost=10413126.05..11515831.01 rows=5102790 width=89) (actual time=59211.363..66579.750 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=104091, temp read=77942 written=78052"
  I/O Timings: read=28662.857
  ->  Gather Merge  (cost=10413126.05..11287291.05 rows=6454984 width=89) (actual time=59211.262..65973.857 rows=795178 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
        I/O Timings: read=51560.508
        ->  Partial GroupAggregate  (cost=10412126.03..10541225.71 rows=3227492 width=89) (actual time=57013.922..62453.555 rows=265059 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
              I/O Timings: read=51560.508
              ->  Sort  (cost=10412126.03..10420194.76 rows=3227492 width=81) (actual time=57013.423..60368.530 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 246944kB
                    Worker 0:  Sort Method: external merge  Disk: 246120kB
                    Worker 1:  Sort Method: external merge  Disk: 241408kB
"                    Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
                    I/O Timings: read=51560.508
                    ->  Parallel Bitmap Heap Scan on test_table  (cost=527733.84..9754320.16 rows=3227492 width=81) (actual time=18155.864..30957.312 rows=2630708 loops=3)
                          Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Index Recheck: 1423
                          Heap Blocks: exact=13374 lossy=44328
                          Buffers: shared hit=3 read=218096
                          I/O Timings: read=51560.508
                          ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=18148.218..18148.228 rows=7892123 loops=1)
                                Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                                Buffers: shared hit=3 read=46389
                                I/O Timings: read=17368.250
Planning Time: 2.787 ms
Execution Time: 66783.481 ms
Total Time - Around 1 min

我似乎不明白為什么我會出現這種行為或我做錯了什么，因為我希望 Postgres 能夠自動優化查詢，但這並沒有發生。

任何幫助將非常感激。

編輯1：

有關 RDS postgres 版本的更多信息：

SELECT version();

x86_64-pc-linux-gnu 上的 PostgreSQL 11.5，由 gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9) 編譯，64 位

編輯2：

使用SET max_parallel_workers_per_gather TO 0運行，默認值為 2（如SHOW max_parallel_workers_per_gather所示）

-- Select Query with Sequential scan ON
QUERY PLAN
GroupAggregate  (cost=11515667.22..11799074.58 rows=5102790 width=89) (actual time=1120868.377..1133231.165 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
  I/O Timings: read=1041723.362
  ->  Sort  (cost=11515667.22..11535032.17 rows=7745982 width=81) (actual time=1120865.304..1129419.809 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
        I/O Timings: read=1041723.362
        ->  Seq Scan on test_table  (cost=0.00..9888011.58 rows=7745982 width=81) (actual time=663266.269..1070560.993 rows=7892123 loops=1)
              Filter: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Filter: 72081131
              Buffers: shared hit=92450 read=8400966
              I/O Timings: read=1041723.362
Planning Time: 5.829 ms
Execution Time: 1133422.968 ms
Total Time - Around 18 mins

隨后，

-- Select Query with Sequential scan OFF
QUERY PLAN
GroupAggregate  (cost=12190966.21..12474373.57 rows=5102790 width=89) (actual time=109048.306..119255.079 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
  I/O Timings: read=55697.723
  ->  Sort  (cost=12190966.21..12210331.17 rows=7745982 width=81) (actual time=109047.621..115468.268 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
        I/O Timings: read=55697.723
        ->  Bitmap Heap Scan on test_table  (cost=527733.84..10563310.57 rows=7745982 width=81) (actual time=16941.764..62203.367 rows=7892123 loops=1)
              Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Index Recheck: 4270
              Heap Blocks: exact=39970 lossy=131737
              Buffers: shared hit=3 read=218096
              I/O Timings: read=55697.723
              ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=16933.964..16933.964 rows=7892123 loops=1)
                    Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                    Buffers: shared hit=3 read=46389
                    I/O Timings: read=16154.294
Planning Time: 3.684 ms
Execution Time: 119440.147 ms
Total Time - Around 2 mins

編輯 3：

我使用以下方法檢查了插入、更新、刪除、活動和死元組的數量

SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"
FROM pg_stat_user_tables
where relname = 'test_table';

得到以下結果

| inserts     | updates | deletes   | live_tuples | dead_tuples |
|-------------|---------|-----------|-------------|-------------|
| 296590964   | 0       | 412400995 | 79717032    | 7589442     |

運行以下命令

VACUUM (VERBOSE, ANALYZE) test_table

得到以下結果：

[2020-05-15 18:34:08] [00000] vacuuming "public.test_table"
[2020-05-15 18:37:13] [00000] scanned index "testtable_dmy_idx" to remove 7573896 row versions
[2020-05-15 18:37:56] [00000] scanned index "testtable_unixts_idx" to remove 7573896 row versions
[2020-05-15 18:38:16] [00000] "test_table": removed 7573896 row versions in 166450 pages
[2020-05-15 18:38:16] [00000] index "testtable_dmy_idx" now contains 79973254 row versions in 1103313 pages
[2020-05-15 18:38:16] [00000] index "testtable_unixts_idx" now contains 79973254 row versions in 318288 pages
[2020-05-15 18:38:16] [00000] "test_table": found 99 removable, 2196653 nonremovable row versions in 212987 out of 8493416 pages
[2020-05-15 18:38:16] [00000] vacuuming "pg_toast.pg_toast_25023"
[2020-05-15 18:38:16] [00000] index "pg_toast_25023_index" now contains 0 row versions in 1 pages
[2020-05-15 18:38:16] [00000] "pg_toast_25023": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
[2020-05-15 18:38:16] [00000] analyzing "public.test_table"
[2020-05-15 18:38:27] [00000] "test_table": scanned 30000 of 8493416 pages, containing 282611 live rows and 0 dead rows; 30000 rows in sample, 80011093 estimated total rows
[2020-05-15 18:38:27] completed in 4 m 19 s 58 ms

之后，相同查詢的結果如下所示：

| inserts   | updates | deletes   | live_tuples | dead_tuples |
|-----------|---------|-----------|-------------|-------------|
| 296590964 | 0       | 412400995 | 80011093    | 0           |

Answer 1

 Rows Removed by Filter: 24027044 Buffers: shared hit=76855 read=8416561 dirtied=33686 I/O Timings: read=3178066.180

在 seq 掃描中，有很多緩沖區被弄臟了。 我猜你最近沒有足夠地吸塵你的桌子。 或者 autovac 落后了，因為您已經接受了默認設置，對於大多數現代專用系統來說，這太慢了（直到 v12）。

此外，24027044 / 8416561 = 每頁大約 2.85 行。 這是一個極低的數字。 你的元組非常寬嗎？ 你的桌子是不是特別臃腫？ 但是這些都不能回答你的問題，因為規划者應該了解它們並考慮到它們。 但是我們可能需要知道計划者哪里出錯了。 （這些計算可能是錯誤的，因為我不知道哪些數字是按比例分配給工人的，哪些不是——但我不認為 3 的因素會改變這里有些東西很奇怪的結論）。

8416561 * 1024 * 8 / 3178.066 /1024 /1024 = 20 MB/S。 這似乎相當低。 您在 RDS“硬件”上配置了哪些 IO 設置？ 對於您的實際 IO 容量，您對 seq_page_cost 和 random_page_cost 的設置可能是錯誤的。 （雖然這可能不是很有效，見下文）

對於您的 Bitmap 堆掃描：

 Heap Blocks: exact=13374 lossy=44328 Buffers: shared hit=3 read=218096

看起來所有符合條件的元組都集中在極少數塊中（與 seq 掃描顯示的整體表大小相比）。 我認為規划者沒有充分考慮到 bitmap 掃描。 有一個補丁可以解決這個問題，但它已經錯過了 v13 的最后期限。 （如果沒有人來審查它，它也可能會錯過 v14 的最后期限——輕推。）基本上，計划者知道“天”列與表格的物理順序有很高的相關性，它使用這個知識說bitmap堆掃描幾乎都是順序的IO。 但它也未能推斷出它只會掃描表的一小部分。 這個問題使得 bitmap 掃描看起來就像 seq 掃描，但有額外的開銷層（咨詢索引），因此它使用它並不奇怪。

Answer 2

通常，特別是對於您的查詢， GROUP BY查詢中的COUNT(*)和SUM(...)往往是性能殺手。 原因是為了得到每個多列組的計數和總和，Postgres 必須訪問索引中每個記錄的表示。 因此，Postgres 無法在邏輯上消除任何記錄，並且在這種情況下往往不會使用索引。

GROUP BY查詢中將使用索引的情況是，如果查詢具有使用某些列的MIN或MAX的HAVING子句。 此外，如果您的查詢有WHERE子句，則索引可能在那里可用。 但是，您當前的查詢無法進行太多優化。

Amazon RDS - Postgres 未對 SELECT 查詢使用索引

問題描述

2 個解決方案

解決方案1
1 2020-05-15 15:48:53

解決方案2
0 2020-05-15 09:59:44

Amazon RDS - Postgres 未對 SELECT 查詢使用索引

問題描述

2 個解決方案

解決方案1 1 2020-05-15 15:48:53

解決方案2 0 2020-05-15 09:59:44

解決方案1
1 2020-05-15 15:48:53

解決方案2
0 2020-05-15 09:59:44