[英]Why is select query is very slow in Postgres?
我有一個簡單的 Postgres 表。 計算總記錄的簡單查詢需要很長時間。 我在表中有 750 萬條記錄,我使用 8 個 vCPU,32 GB memory 機器。 數據庫在同一台機器上。
編輯:添加查詢。
以下查詢非常慢:
SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
Output的解釋
$ explain SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
---------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.17 rows=10000 width=1985)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144730.02 rows=3835870 width=1985)
Filter: (NOT processed)
(3 rows)
我的表格如下:
Column | Type | Collation | Nullable | Default
-------------------+----------------+-----------+----------+---------
id | integer | | |
name | character(500) | | |
domain | character(500) | | |
year_founded | real | | |
industry | character(500) | | |
size_range | character(500) | | |
locality | character(500) | | |
country | character(500) | | |
linkedinurl | character(500) | | |
employees | integer | | |
processed | boolean | | not null | false
employee_estimate | integer | | |
Indexes:
"import_csv_id_idx" btree (id)
"processed_idx" btree (processed)
謝謝
編輯3:
# explain analyze SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8331.070..8355.556 rows=10000 loops=1)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8331.067..8354.874 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Planning time: 0.081 ms
Execution time: 8355.925 ms
(6 rows)
解釋(分析,緩沖)
# explain (analyze, buffers) SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8236.899..8260.941 rows=10000 loops=1)
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8236.896..8260.104 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
Planning time: 0.386 ms
Execution time: 8261.406 ms
(8 rows)
它很慢,因為在找到第 10001 個通過之前,它必須挖掘未通過processed = False
標准的 3482252 行,並且顯然所有那些失敗的行都隨機散布在表格周圍,導致很多慢 IO。
您需要(processed, id)
或(id) where processed = false
如果您執行其中的第一個,您可以將索引放在單獨處理上,因為它不再獨立有用(如果它曾經開始的話)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.