PostgreSQL 进行 seq 扫描而不是仅索引扫描

Question

I have the following table structure:我有以下表结构：

create table transfers
(
    id serial not null
        constraint transactions_pkey
            primary key,
    name varchar(255) not null,
    money integer not null
);

create index transfers_name_index
    on transfers (name);

When executing the following query it is quite slow as it does a sequential scan:执行以下查询时，它会非常慢，因为它执行顺序扫描：

EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;

Group  (cost=37860.49..41388.54 rows=14802 width=15) (actual time=4285.530..7459.872 rows=999766 loops=1)
  Group Key: name
  ->  Gather Merge  (cost=37860.49..41314.53 rows=29604 width=15) (actual time=4285.529..7136.432 rows=999935 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=36860.46..36897.47 rows=14802 width=15) (actual time=4104.159..5107.148 rows=333312 loops=3)
              Sort Key: name
              Sort Method: external merge  Disk: 14928kB
              Worker 0:  Sort Method: external merge  Disk: 13616kB
              Worker 1:  Sort Method: external merge  Disk: 13656kB
              ->  Partial HashAggregate  (cost=35687.15..35835.17 rows=14802 width=15) (actual time=604.984..689.111 rows=333312 loops=3)
                    Group Key: name
                    ->  Parallel Seq Scan on transfers  (cost=0.00..32571.52 rows=1246252 width=15) (actual time=0.063..200.548 rows=997032 loops=3)
Planning Time: 0.088 ms
Execution Time: 7531.142 ms

However when setting seqscan to off, the index only scan is correctly used, as I would expect.但是，当将 seqscan 设置为 off 时，正如我所料，正确使用了仅索引扫描。

SET enable_seqscan = OFF;

EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;

Group  (cost=1000.45..100492.67 rows=14802 width=15) (actual time=8.032..2212.538 rows=999766 loops=1)
  Group Key: name
  ->  Gather Merge  (cost=1000.45..100418.66 rows=29604 width=15) (actual time=8.029..1880.388 rows=999778 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Group  (cost=0.43..96001.60 rows=14802 width=15) (actual time=0.074..383.471 rows=333259 loops=3)
              Group Key: name
              ->  Parallel Index Only Scan using transfers_name_index on transfers  (cost=0.43..92885.97 rows=1246252 width=15) (actual time=0.066..189.436 rows=997032 loops=3)
                    Heap Fetches: 0
Planning Time: 0.197 ms
Execution Time: 2279.321 ms

Why does Postgres not use the more efficient index only scan without forcing it?为什么 Postgres 不使用更有效的仅索引扫描而不强制它？ The table contains about 3 million records.该表包含大约 300 万条记录。 Am using PostgreSQL 11.2.我正在使用 PostgreSQL 11.2。

Answer 1

Try adding a decent amount of data and run the queries again.尝试添加大量数据并再次运行查询。 Postgres doesn't always use the index and may decide it will be quicker to do a scan if there are only a few records in the table. Postgres 并不总是使用索引，如果表中只有几条记录，它可能会更快地进行扫描。

Answer 2

For postgres to prefer the index only scan, most of the pages should be visible.为了使 postgres 更喜欢仅索引扫描，大多数页面应该是可见的。 You can check this in pg_class:您可以在 pg_class 中检查：

SELECT relpages, relallvisible FROM pg_class WHERE relname='transfers';

If relallvisible is 0 or much lower than relpages, you should VACUUM the table:如果 relallvisible 为 0 或远低于 relpages，则应 VACUUM 表：

VACUUM ANALYZE transfers;

Answer 3

When I fill your table with 3e6 rows containing 1e6 distinct names, I get the index only scan.当我用包含 1e6 个不同名称的 3e6 行填充您的表时，我得到了仅索引扫描。 However, if I force the distinct value estimate to match yours, it switches to the seq scan:但是，如果我强制不同的值估计与您的匹配，它会切换到 seq 扫描：

alter table transfers alter name set (N_DISTINCT = 14802);
analyze transfers;

So if you use the same method to set it to the correct value, I bet yours would switch the other way.因此，如果您使用相同的方法将其设置为正确的值，我敢打赌您的方法会切换到另一种方式。

Why is it wrong in the first place?为什么一开始就错了？ I bet your table is clustered on name, and your default_statistics_target is too low.我敢打赌，您的表是按名称聚集的，而您的 default_statistics_target 太低了。

PostgreSQL 进行 seq 扫描而不是仅索引扫描

问题描述

3 个解决方案

解决方案1
1 2019-11-12 17:16:55

解决方案2
1 2019-11-12 17:59:38

解决方案3
1 2019-11-12 21:40:03

PostgreSQL 进行 seq 扫描而不是仅索引扫描

问题描述

3 个解决方案

解决方案1 1 2019-11-12 17:16:55

解决方案2 1 2019-11-12 17:59:38

解决方案3 1 2019-11-12 21:40:03

解决方案1
1 2019-11-12 17:16:55

解决方案2
1 2019-11-12 17:59:38

解决方案3
1 2019-11-12 21:40:03