[英]Next query after large delete is slow. Why does a SELECT trigger WAL file archiving?
請問這是怎么回事? 具體來說
環境:
我有一個流式副本和一個 WAL 存檔目錄。
我創建了一個包含 5000 萬行的簡單表。
CREATE TABLE IF NOT EXISTS bigt
(
a integer,
b integer,
des text
);
現在用一些數據填充表(5000 萬行就足夠了):
insert into bigt select i,mod(i,4),md5( mod(i,4)::text) from generate_series(1,50000000) i;
讓我們看看查詢它需要多長時間(掃描它)
select count(*) from bigt;
count
----------
50000000
(1 row)
Time: 1459.486 ms (00:01.459)
好的,因此不到 1.5 秒(請注意,由於緩存,在第二次和第二次查詢中這當然會更快)。
現在我將刪除該表中的一半行,然后重新查詢表數並觀察會發生什么:
delete from bigt where a>25000000;
DELETE 25000000
Time: 41669.131 ms (00:41.669)
mydb=# select count(*) from bigt;
count
----------
25000000
(1 row)
Time: 22453.483 ms (00:22.453)
哇。 22.5 秒,我同時跟蹤 PostgreSQL 日志文件。 在 DELETE 語句之后,我在運行 SELECT 之前等待 10 秒。 運行 SELECT 的行為似乎會導致一系列日志行,如
DEBUG: archived write-ahead log file "0000000100000023000000F6"
在這些行的 10 秒之后,SELECT 在 22.5 秒后完成!
更新所以只是懷疑 SELECT 觸發了某些東西,我簡化了場景並且(消除了在測試期間啟動 autovacuum 的可能性)我為此測試表禁用了 autovacuum。
空表(截斷)。 與以前相同的架構。
mydb=# ALTER TABLE bigt SET (autovacuum_enabled = off);
ALTER TABLE
Time: 2.436 ms
mydb=# truncate bigt;
TRUNCATE TABLE
Time: 196.077 ms
插入 3000 萬行
mydb=# insert into bigt select i,mod(i,4),md5( mod(i,4)::text) from generate_series(1,30000000) i;
INSERT 0 30000000
Time: 57994.379 ms (00:57.994)
等待幾秒鍾,然后發出 SELECT (我正在跟蹤日志,這個 SELECT 似乎導致歸檔活動)
這一次,我使用我通常使用的解釋分析 output 運行它
db=# explain (analyze,buffers,verbose) select count(*) from bigt;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=384761.97..384761.98 rows=1 width=8) (actual time=22437.498..22463.793 rows=1 loops=1)
Output: count(*)
Buffers: shared read=280374 dirtied=280374 written=94611
I/O Timings: read=59174.975 write=1503.423
-> Gather (cost=384761.91..384761.96 rows=4 width=8) (actual time=22437.135..22463.785 rows=5 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 4
Workers Launched: 4
Buffers: shared read=280374 dirtied=280374 written=94611
I/O Timings: read=59174.975 write=1503.423
-> Partial Aggregate (cost=383761.91..383761.92 rows=1 width=8) (actual time=22410.083..22410.083 rows=1 loops=5)
Output: PARTIAL count(*)
Buffers: shared read=280374 dirtied=280374 written=94611
I/O Timings: read=59174.975 write=1503.423
Worker 0: actual time=22403.414..22403.415 rows=1 loops=1
Buffers: shared read=55891 dirtied=55891 written=18963
I/O Timings: read=11813.538 write=323.569
Worker 1: actual time=22403.428..22403.429 rows=1 loops=1
Buffers: shared read=55196 dirtied=55196 written=18936
I/O Timings: read=11892.729 write=322.704
Worker 2: actual time=22403.400..22403.401 rows=1 loops=1
Buffers: shared read=55584 dirtied=55584 written=18621
I/O Timings: read=11842.255 write=317.909
Worker 3: actual time=22403.248..22403.249 rows=1 loops=1
Buffers: shared read=55424 dirtied=55424 written=18837
I/O Timings: read=11814.539 write=288.189
-> Parallel Seq Scan on public.bigt (cost=0.00..363084.33 rows=8271033 width=0) (actual time=0.354..21911.602 rows=6000000 loops=5)
Output: a, b, des
Buffers: shared read=280374 dirtied=280374 written=94611
I/O Timings: read=59174.975 write=1503.423
Worker 0: actual time=0.174..21909.475 rows=5980319 loops=1
Buffers: shared read=55891 dirtied=55891 written=18963
I/O Timings: read=11813.538 write=323.569
Worker 1: actual time=0.526..21914.253 rows=5905972 loops=1
Buffers: shared read=55196 dirtied=55196 written=18936
I/O Timings: read=11892.729 write=322.704
Worker 2: actual time=0.519..21908.078 rows=5947488 loops=1
Buffers: shared read=55584 dirtied=55584 written=18621
I/O Timings: read=11842.255 write=317.909
Worker 3: actual time=0.525..21909.759 rows=5930368 loops=1
Buffers: shared read=55424 dirtied=55424 written=18837
I/O Timings: read=11814.539 write=288.189
Query Identifier: -3522295412005428879
Planning:
Buffers: shared hit=7 read=4 dirtied=1
I/O Timings: read=0.025
Planning Time: 0.312 ms
Execution Time: 22463.902 ms
(48 rows)
Time: 22465.325 ms (00:22.465)
日志文件提取顯示 SELECT 以及 SELECT 對日志文件歸檔的影響。 “解釋選擇”似乎觸發了許多存檔的 WAL 活動。
2022-08-22 17:13:23.788 BST [26503] DEBUG: archived write-ahead log file "00000001000000260000004D"
2022-08-22 17:13:23.997 BST [26503] DEBUG: archived write-ahead log file "00000001000000260000004E"
2022-08-22 17:13:24.412 BST [26503] DEBUG: archived write-ahead log file "00000001000000260000004F"
2022-08-22 17:13:24.635 BST [26503] DEBUG: archived write-ahead log file "000000010000002600000050"
2022-08-22 17:13:25.037 BST [61194] usr LOG: duration: 57994.310 ms
2022-08-22 17:13:30.715 BST [61194] usr LOG: statement: explain (analyze,buffers,verbose) select count(*) from bigt;
2022-08-22 17:13:30.716 BST [26496] DEBUG: registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496] DEBUG: registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496] DEBUG: registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496] DEBUG: registering background worker "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496] DEBUG: starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.716 BST [26496] DEBUG: starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.717 BST [26496] DEBUG: starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.717 BST [26496] DEBUG: starting background worker process "parallel worker for PID 61194"
2022-08-22 17:13:30.830 BST [26503] DEBUG: archived write-ahead log file "000000010000002600000051"
2022-08-22 17:13:30.880 BST [26503] DEBUG: archived write-ahead log file "000000010000002600000052"
2022-08-22 17:13:31.067 BST [26503] DEBUG: archived write-ahead log file "000000010000002600000053"
我認為這很有啟發性。 所以我然后在桌子上做了一個手動吸塵(雖然這報告刪除了 0 行。我希望我剛剛插入了 3000 萬行)。
然后我重復 EXPLAIN SELECT 並且計划顯示這次沒有緩沖區被“弄臟”。 我想這是最大的線索。
db=# explain (analyze,buffers,verbose) select count(*) from bigt;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=375124.06..375124.07 rows=1 width=8) (actual time=1030.392..1036.682 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=65150 read=215224
I/O Timings: read=495.941
-> Gather (cost=375124.00..375124.05 rows=4 width=8) (actual time=1030.270..1036.676 rows=5 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=65150 read=215224
I/O Timings: read=495.941
-> Partial Aggregate (cost=374124.00..374124.01 rows=1 width=8) (actual time=1010.259..1010.260 rows=1 loops=5)
Output: PARTIAL count(*)
Buffers: shared hit=65150 read=215224
I/O Timings: read=495.941
Worker 0: actual time=1005.289..1005.289 rows=1 loops=1
Buffers: shared hit=12674 read=42481
I/O Timings: read=97.199
Worker 1: actual time=1005.322..1005.323 rows=1 loops=1
Buffers: shared hit=12833 read=43200
I/O Timings: read=99.123
Worker 2: actual time=1005.319..1005.320 rows=1 loops=1
Buffers: shared hit=12864 read=42989
I/O Timings: read=100.243
Worker 3: actual time=1005.289..1005.290 rows=1 loops=1
Buffers: shared hit=12793 read=43094
I/O Timings: read=98.914
-> Parallel Seq Scan on public.bigt (cost=0.00..355374.00 rows=7500000 width=0) (actual time=0.140..613.439 rows=6000000 loops=5)
Output: a, b, des
Buffers: shared hit=65150 read=215224
I/O Timings: read=495.941
Worker 0: actual time=0.204..614.154 rows=5901585 loops=1
Buffers: shared hit=12674 read=42481
I/O Timings: read=97.199
Worker 1: actual time=0.171..613.932 rows=5995531 loops=1
Buffers: shared hit=12833 read=43200
I/O Timings: read=99.123
Worker 2: actual time=0.132..613.952 rows=5976271 loops=1
Buffers: shared hit=12864 read=42989
I/O Timings: read=100.243
Worker 3: actual time=0.179..611.561 rows=5979909 loops=1
Buffers: shared hit=12793 read=43094
I/O Timings: read=98.914
Query Identifier: -3522295412005428879
Planning:
Buffers: shared hit=1 read=1
I/O Timings: read=0.663
Planning Time: 1.231 ms
Execution Time: 1036.729 ms
(48 rows)
Time: 1040.606 ms (00:01.041)
奇怪的是,查詢仍然會觸發所有這些 WAL 歸檔活動。 它是可靠可重復的。 就是不明白為什么。
SELECT 正在弄臟一些緩沖區,這條線索讓我閱讀了每個行標題中“對所有人可見”位的設置。 因此,我需要 go 閱讀相關內容,因為這聽起來很相關。 謝謝大家的幫助!
非默認設置是: ... wal_log_hints=on,
好吧,這就是為什么要歸檔的答案。 在臟表上執行 SELECT 將設置提示位,並且使用該設置,它會生成 WAL,並且需要存檔 WAL。
SELECT 不會等待存檔發生。 但是通過觸發歸檔,它必須與它競爭資源。 但這可能不是緩慢的主要原因。 即使它沒有生成 WAL,設置提示位仍然會消耗 CPU 和 IO。
有人提議添加一個設置來限制 SELECT 願意設置的提示位數量。 但我認為它從未被 go 接受。 一方面是 SELECT 已經完成了確定元組對其不可見的工作,應該設置該位,以便未來的 SELECT 不必重復該確定。 另一方面,設置提示位最好由 autovacuum 完成(沒有人在等待它),那么為什么 SELECT 只是為了竊取 autovacuum 的部分工作而惹惱它的客戶呢?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.