大刪除后的下一個查詢很慢。為什么 SELECT 會觸發 WAL 文件歸檔？

Question

請問這是怎么回事？ 具體來說

為什么是SELECT，DELETE后這么慢。 我知道它現在必須導航死元組，但成本高得多嗎？
為什么 SELECT 似乎會導致日志文件歸檔？ 意外，因為 SELECT 不應該生成任何 WAL？
似乎 SELECT 在返回答案之前等待日志文件歸檔。 為什么？

環境：

PostgreSQL v 14.2
日志級別=調試1。 （所以我可以看到存檔的 WAL 活動）。

我有一個流式副本和一個 WAL 存檔目錄。

我創建了一個包含 5000 萬行的簡單表。

CREATE TABLE IF NOT EXISTS bigt
(
    a integer,
    b integer,
    des text
);

現在用一些數據填充表（5000 萬行就足夠了）：

insert into bigt select i,mod(i,4),md5( mod(i,4)::text) from generate_series(1,50000000) i;

讓我們看看查詢它需要多長時間（掃描它）

select count(*) from bigt;
  count
----------
 50000000
(1 row)

Time: 1459.486 ms (00:01.459)

好的，因此不到 1.5 秒（請注意，由於緩存，在第二次和第二次查詢中這當然會更快）。

現在我將刪除該表中的一半行，然后重新查詢表數並觀察會發生什么：

delete from bigt where a>25000000;
DELETE 25000000
Time: 41669.131 ms (00:41.669)
mydb=# select count(*) from bigt;
  count
----------
 25000000
(1 row)

Time: 22453.483 ms (00:22.453)

哇。 22.5 秒，我同時跟蹤 PostgreSQL 日志文件。 在 DELETE 語句之后，我在運行 SELECT 之前等待 10 秒。 運行 SELECT 的行為似乎會導致一系列日志行，如

DEBUG:  archived write-ahead log file "0000000100000023000000F6"

在這些行的 10 秒之后，SELECT 在 22.5 秒后完成！

更新所以只是懷疑 SELECT 觸發了某些東西，我簡化了場景並且（消除了在測試期間啟動 autovacuum 的可能性）我為此測試表禁用了 autovacuum。

空表（截斷）。 與以前相同的架構。

mydb=# ALTER TABLE bigt SET (autovacuum_enabled = off);
ALTER TABLE
Time: 2.436 ms
mydb=# truncate bigt;
TRUNCATE TABLE
Time: 196.077 ms

插入 3000 萬行

mydb=# insert into bigt select i,mod(i,4),md5( mod(i,4)::text) from generate_series(1,30000000) i;
INSERT 0 30000000
Time: 57994.379 ms (00:57.994)

等待幾秒鍾，然后發出 SELECT （我正在跟蹤日志，這個 SELECT 似乎導致歸檔活動）

這一次，我使用我通常使用的解釋分析 output 運行它

db=# explain (analyze,buffers,verbose) select count(*) from bigt;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=384761.97..384761.98 rows=1 width=8) (actual time=22437.498..22463.793 rows=1 loops=1)
   Output: count(*)
   Buffers: shared read=280374 dirtied=280374 written=94611
   I/O Timings: read=59174.975 write=1503.423
   ->  Gather  (cost=384761.91..384761.96 rows=4 width=8) (actual time=22437.135..22463.785 rows=5 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared read=280374 dirtied=280374 written=94611
         I/O Timings: read=59174.975 write=1503.423
         ->  Partial Aggregate  (cost=383761.91..383761.92 rows=1 width=8) (actual time=22410.083..22410.083 rows=1 loops=5)
               Output: PARTIAL count(*)
               Buffers: shared read=280374 dirtied=280374 written=94611
               I/O Timings: read=59174.975 write=1503.423
               Worker 0:  actual time=22403.414..22403.415 rows=1 loops=1
                 Buffers: shared read=55891 dirtied=55891 written=18963
                 I/O Timings: read=11813.538 write=323.569
               Worker 1:  actual time=22403.428..22403.429 rows=1 loops=1
                 Buffers: shared read=55196 dirtied=55196 written=18936
                 I/O Timings: read=11892.729 write=322.704
               Worker 2:  actual time=22403.400..22403.401 rows=1 loops=1
                 Buffers: shared read=55584 dirtied=55584 written=18621
                 I/O Timings: read=11842.255 write=317.909
               Worker 3:  actual time=22403.248..22403.249 rows=1 loops=1
                 Buffers: shared read=55424 dirtied=55424 written=18837
                 I/O Timings: read=11814.539 write=288.189
               ->  Parallel Seq Scan on public.bigt  (cost=0.00..363084.33 rows=8271033 width=0) (actual time=0.354..21911.602 rows=6000000 loops=5)
                     Output: a, b, des
                     Buffers: shared read=280374 dirtied=280374 written=94611
                     I/O Timings: read=59174.975 write=1503.423
                     Worker 0:  actual time=0.174..21909.475 rows=5980319 loops=1
                       Buffers: shared read=55891 dirtied=55891 written=18963
                       I/O Timings: read=11813.538 write=323.569
                     Worker 1:  actual time=0.526..21914.253 rows=5905972 loops=1
                       Buffers: shared read=55196 dirtied=55196 written=18936
                       I/O Timings: read=11892.729 write=322.704
                     Worker 2:  actual time=0.519..21908.078 rows=5947488 loops=1
                       Buffers: shared read=55584 dirtied=55584 written=18621
                       I/O Timings: read=11842.255 write=317.909
                     Worker 3:  actual time=0.525..21909.759 rows=5930368 loops=1
                       Buffers: shared read=55424 dirtied=55424 written=18837
                       I/O Timings: read=11814.539 write=288.189
 Query Identifier: -3522295412005428879
 Planning:
   Buffers: shared hit=7 read=4 dirtied=1
   I/O Timings: read=0.025
 Planning Time: 0.312 ms
 Execution Time: 22463.902 ms
(48 rows)

Time: 22465.325 ms (00:22.465)

日志文件提取顯示 SELECT 以及 SELECT 對日志文件歸檔的影響。 “解釋選擇”似乎觸發了許多存檔的 WAL 活動。

2022-08-22 17:13:23.788 BST [26503]  DEBUG:  archived write-ahead log file "00000001000000260000004D"
2022-08-22 17:13:23.997 BST [26503]  DEBUG:  archived write-ahead log file "00000001000000260000004E"
2022-08-22 17:13:24.412 BST [26503]  DEBUG:  archived write-ahead log file "00000001000000260000004F"
2022-08-22 17:13:24.635 BST [26503]  DEBUG:  archived write-ahead log file "000000010000002600000050"
2022-08-22 17:13:25.037 BST [61194] usr LOG:  duration: 57994.310 ms
2022-08-22 17:13:30.715 BST [61194] usr LOG:  statement: explain (analyze,buffers,verbose) select count(*) from bigt;
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496]  DEBUG:  starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.717 BST [26496]  DEBUG:  starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.717 BST [26496]  DEBUG:  starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.830 BST [26503]  DEBUG:  archived write-ahead log file "000000010000002600000051"
2022-08-22 17:13:30.880 BST [26503]  DEBUG:  archived write-ahead log file "000000010000002600000052"
2022-08-22 17:13:31.067 BST [26503]  DEBUG:  archived write-ahead log file "000000010000002600000053"

我認為這很有啟發性。 所以我然后在桌子上做了一個手動吸塵（雖然這報告刪除了 0 行。我希望我剛剛插入了 3000 萬行）。

然后我重復 EXPLAIN SELECT 並且計划顯示這次沒有緩沖區被“弄臟”。 我想這是最大的線索。

db=# explain (analyze,buffers,verbose) select count(*) from bigt;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=375124.06..375124.07 rows=1 width=8) (actual time=1030.392..1036.682 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=65150 read=215224
   I/O Timings: read=495.941
   ->  Gather  (cost=375124.00..375124.05 rows=4 width=8) (actual time=1030.270..1036.676 rows=5 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=65150 read=215224
         I/O Timings: read=495.941
         ->  Partial Aggregate  (cost=374124.00..374124.01 rows=1 width=8) (actual time=1010.259..1010.260 rows=1 loops=5)
               Output: PARTIAL count(*)
               Buffers: shared hit=65150 read=215224
               I/O Timings: read=495.941
               Worker 0:  actual time=1005.289..1005.289 rows=1 loops=1
                 Buffers: shared hit=12674 read=42481
                 I/O Timings: read=97.199
               Worker 1:  actual time=1005.322..1005.323 rows=1 loops=1
                 Buffers: shared hit=12833 read=43200
                 I/O Timings: read=99.123
               Worker 2:  actual time=1005.319..1005.320 rows=1 loops=1
                 Buffers: shared hit=12864 read=42989
                 I/O Timings: read=100.243
               Worker 3:  actual time=1005.289..1005.290 rows=1 loops=1
                 Buffers: shared hit=12793 read=43094
                 I/O Timings: read=98.914
               ->  Parallel Seq Scan on public.bigt  (cost=0.00..355374.00 rows=7500000 width=0) (actual time=0.140..613.439 rows=6000000 loops=5)
                     Output: a, b, des
                     Buffers: shared hit=65150 read=215224
                     I/O Timings: read=495.941
                     Worker 0:  actual time=0.204..614.154 rows=5901585 loops=1
                       Buffers: shared hit=12674 read=42481
                       I/O Timings: read=97.199
                     Worker 1:  actual time=0.171..613.932 rows=5995531 loops=1
                       Buffers: shared hit=12833 read=43200
                       I/O Timings: read=99.123
                     Worker 2:  actual time=0.132..613.952 rows=5976271 loops=1
                       Buffers: shared hit=12864 read=42989
                       I/O Timings: read=100.243
                     Worker 3:  actual time=0.179..611.561 rows=5979909 loops=1
                       Buffers: shared hit=12793 read=43094
                       I/O Timings: read=98.914
 Query Identifier: -3522295412005428879
 Planning:
   Buffers: shared hit=1 read=1
   I/O Timings: read=0.663
 Planning Time: 1.231 ms
 Execution Time: 1036.729 ms
(48 rows)

Time: 1040.606 ms (00:01.041)

奇怪的是，查詢仍然會觸發所有這些 WAL 歸檔活動。 它是可靠可重復的。 就是不明白為什么。

SELECT 正在弄臟一些緩沖區，這條線索讓我閱讀了每個行標題中“對所有人可見”位的設置。 因此，我需要 go 閱讀相關內容，因為這聽起來很相關。 謝謝大家的幫助！

Answer 1

非默認設置是： ... wal_log_hints=on,

好吧，這就是為什么要歸檔的答案。 在臟表上執行 SELECT 將設置提示位，並且使用該設置，它會生成 WAL，並且需要存檔 WAL。

SELECT 不會等待存檔發生。 但是通過觸發歸檔，它必須與它競爭資源。 但這可能不是緩慢的主要原因。 即使它沒有生成 WAL，設置提示位仍然會消耗 CPU 和 IO。

有人提議添加一個設置來限制 SELECT 願意設置的提示位數量。 但我認為它從未被 go 接受。 一方面是 SELECT 已經完成了確定元組對其不可見的工作，應該設置該位，以便未來的 SELECT 不必重復該確定。 另一方面，設置提示位最好由 autovacuum 完成（沒有人在等待它），那么為什么 SELECT 只是為了竊取 autovacuum 的部分工作而惹惱它的客戶呢？

大刪除后的下一個查詢很慢。為什么 SELECT 會觸發 WAL 文件歸檔？

問題描述

1 個解決方案

解決方案1
0 2022-08-22 17:12:01

大刪除后的下一個查詢很慢。 為什么 SELECT 會觸發 WAL 文件歸檔？

問題描述

1 個解決方案

解決方案1 0 2022-08-22 17:12:01

大刪除后的下一個查詢很慢。為什么 SELECT 會觸發 WAL 文件歸檔？

解決方案1
0 2022-08-22 17:12:01