[英]PostgreSQL conducts seq scan instead of index only scan
I have the following table structure:我有以下表结构:
create table transfers
(
id serial not null
constraint transactions_pkey
primary key,
name varchar(255) not null,
money integer not null
);
create index transfers_name_index
on transfers (name);
When executing the following query it is quite slow as it does a sequential scan:执行以下查询时,它会非常慢,因为它执行顺序扫描:
EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;
Group (cost=37860.49..41388.54 rows=14802 width=15) (actual time=4285.530..7459.872 rows=999766 loops=1)
Group Key: name
-> Gather Merge (cost=37860.49..41314.53 rows=29604 width=15) (actual time=4285.529..7136.432 rows=999935 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=36860.46..36897.47 rows=14802 width=15) (actual time=4104.159..5107.148 rows=333312 loops=3)
Sort Key: name
Sort Method: external merge Disk: 14928kB
Worker 0: Sort Method: external merge Disk: 13616kB
Worker 1: Sort Method: external merge Disk: 13656kB
-> Partial HashAggregate (cost=35687.15..35835.17 rows=14802 width=15) (actual time=604.984..689.111 rows=333312 loops=3)
Group Key: name
-> Parallel Seq Scan on transfers (cost=0.00..32571.52 rows=1246252 width=15) (actual time=0.063..200.548 rows=997032 loops=3)
Planning Time: 0.088 ms
Execution Time: 7531.142 ms
However when setting seqscan to off, the index only scan is correctly used, as I would expect.但是,当将 seqscan 设置为 off 时,正如我所料,正确使用了仅索引扫描。
SET enable_seqscan = OFF;
EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;
Group (cost=1000.45..100492.67 rows=14802 width=15) (actual time=8.032..2212.538 rows=999766 loops=1)
Group Key: name
-> Gather Merge (cost=1000.45..100418.66 rows=29604 width=15) (actual time=8.029..1880.388 rows=999778 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Group (cost=0.43..96001.60 rows=14802 width=15) (actual time=0.074..383.471 rows=333259 loops=3)
Group Key: name
-> Parallel Index Only Scan using transfers_name_index on transfers (cost=0.43..92885.97 rows=1246252 width=15) (actual time=0.066..189.436 rows=997032 loops=3)
Heap Fetches: 0
Planning Time: 0.197 ms
Execution Time: 2279.321 ms
Why does Postgres not use the more efficient index only scan without forcing it?为什么 Postgres 不使用更有效的仅索引扫描而不强制它? The table contains about 3 million records.
该表包含大约 300 万条记录。 Am using PostgreSQL 11.2.
我正在使用 PostgreSQL 11.2。
Try adding a decent amount of data and run the queries again.尝试添加大量数据并再次运行查询。 Postgres doesn't always use the index and may decide it will be quicker to do a scan if there are only a few records in the table.
Postgres 并不总是使用索引,如果表中只有几条记录,它可能会更快地进行扫描。
For postgres to prefer the index only scan, most of the pages should be visible.为了使 postgres 更喜欢仅索引扫描,大多数页面应该是可见的。 You can check this in pg_class:
您可以在 pg_class 中检查:
SELECT relpages, relallvisible FROM pg_class WHERE relname='transfers';
If relallvisible is 0 or much lower than relpages, you should VACUUM the table:如果 relallvisible 为 0 或远低于 relpages,则应 VACUUM 表:
VACUUM ANALYZE transfers;
When I fill your table with 3e6 rows containing 1e6 distinct names, I get the index only scan.当我用包含 1e6 个不同名称的 3e6 行填充您的表时,我得到了仅索引扫描。 However, if I force the distinct value estimate to match yours, it switches to the seq scan:
但是,如果我强制不同的值估计与您的匹配,它会切换到 seq 扫描:
alter table transfers alter name set (N_DISTINCT = 14802);
analyze transfers;
So if you use the same method to set it to the correct value, I bet yours would switch the other way.因此,如果您使用相同的方法将其设置为正确的值,我敢打赌您的方法会切换到另一种方式。
Why is it wrong in the first place?为什么一开始就错了? I bet your table is clustered on name, and your default_statistics_target is too low.
我敢打赌,您的表是按名称聚集的,而您的 default_statistics_target 太低了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.