PostgreSQL Bitmap堆扫描索引非常慢，但Index Only Scan很快

Question

I create a table with 43kk rows, populate them with values 1..200. 我创建一个43kk行的表，用值1..200填充它们。 So ~220k per each number spreaded through the table. 因此，通过表格传播的每个数字约为220k。

create table foo (id integer primary key, val bigint);
insert into foo
  select i, random() * 200 from generate_series(1, 43000000) as i;
create index val_index on foo(val);
vacuum analyze foo;
explain analyze select id from foo where val = 55;

Result: http://explain.depesz.com/s/fdsm 结果： http ： //explain.depesz.com/s/fdsm

I expect total runtime < 1s, is it possible? 我希望总运行时间<1s，是否可能？ I have SSD, core i5 (1,8), 4gb RAM. 我有SSD，核心i5（1,8），4GB RAM。 9,3 Postgres. 9,3 Postgres。

If I use Index Only scan it works very fast: 如果我使用Index Only扫描，它的工作速度非常快：

explain analyze select val from foo where val = 55;

http://explain.depesz.com/s/7hm http://explain.depesz.com/s/7hm

But I need to select id not val so Incex Only scan is not suitable in my case. 但我需要选择id而不是val，所以Incex Only扫描不适合我的情况。

Thanks in advance! 提前致谢！

Additional info: 附加信息：

SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid)) 
FROM pg_class WHERE oid='foo'::regclass;

Result: 结果：

"foo";236758;43800000;"1850 MB"

Config: 配置：

"cpu_index_tuple_cost";"0.005";""
"cpu_operator_cost";"0.0025";""
"cpu_tuple_cost";"0.01";""
"effective_cache_size";"16384";"8kB"
"max_connections";"100";""
"max_stack_depth";"2048";"kB"
"random_page_cost";"4";""
"seq_page_cost";"1";""
"shared_buffers";"16384";"8kB"
"temp_buffers";"1024";"8kB"
"work_mem";"204800";"kB"

Answer 1

I have got answer here: http://ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only-scan-is-fast 我在这里得到了答案： http ： //ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only-扫描是快速

The trick is to use composite index for id and value: 诀窍是使用id和value的复合索引：

create index val_id_index on foo(val, id);

So Index Only scan will be used, but I can select id now. 因此，将使用仅索引扫描，但我现在可以选择ID 。

select id from foo where val = 55;

Result: 结果：

http://explain.depesz.com/s/nDt3 http://explain.depesz.com/s/nDt3

But this works ONLY in Postgres with version 9.2+. 但这仅适用于版本9.2+的Postgres。 If you have forced to use versions below try another options. 如果您被迫使用以下版本，请尝试其他选项。

Answer 2

Although you're querying only 0,5% of the table, or ~10MB worth of data (out of nearly 2GB table), values of interest are spread evenly across whole table. 虽然您只查询表的0.5％，或者大约10MB的数据（在近2GB的表中），但是感兴趣的值在整个表中均匀分布。

You can see it in the first plan you've provided: 您可以在您提供的第一个计划中看到它：

BitmapIndexScan completes in 123.172ms BitmapIndexScan在123.172ms内完成
BitmapHeapScan takes 17055.046ms. BitmapHeapScan需要17055.046ms。

You can try clustering your tables based on index order, which will put rows together on the same pages. 您可以尝试根据索引顺序对表进行集群，这会将行放在同一页面上。 On my SATA disks I have the following: 在我的SATA磁盘上，我有以下内容：

SET work_mem TO '300MB';
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=90.315..35091.665 rows=215022 loops=1)
    Heap Blocks: exact=140489
    Buffers: shared hit=20775 read=120306 written=24124

SET maintenance_work_mem TO '1GB';
CLUSTER foo USING val_index;
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=49.215..407.505 rows=215022 loops=1)
    Heap Blocks: exact=1163
    Buffers: shared read=1755

Of course, this is a one-time operation and it'll get longer bit-by-bit over the time. 当然，这是一次性操作，并且随着时间的推移逐渐变长。

Answer 3

You can try to reduce random_page_cost -- for SSD it can be 1. Second, you can increase a work_mem .. 10MB is relatively low value for current servers with gigabytes RAM. 您可以尝试减少random_page_cost - 对于SSD，它可以是1.其次，您可以增加work_mem。对于具有千兆字节RAM的当前服务器，10MB是相对较低的值。 You should to recheck effective_cache_size - it can be too low too. 你应该重新检查effective_cache_size - 它也可能太低了。

work_mem * max_connection * 2 + shared_buffers < RAM dedicated for Postgres
effective_cache ~ shared_buffers + file system cache

PostgreSQL Bitmap堆扫描索引非常慢，但Index Only Scan很快

问题描述

3 个解决方案

解决方案1
6 已采纳 2014-10-31 10:47:38

解决方案2
3 2014-10-31 10:16:02

解决方案3
0 2014-10-31 07:01:00

PostgreSQL Bitmap堆扫描索引非常慢，但Index Only Scan很快

问题描述

3 个解决方案

解决方案1 6 已采纳 2014-10-31 10:47:38

解决方案2 3 2014-10-31 10:16:02

解决方案3 0 2014-10-31 07:01:00

解决方案1
6 已采纳 2014-10-31 10:47:38

解决方案2
3 2014-10-31 10:16:02

解决方案3
0 2014-10-31 07:01:00