用于BETWEEN和ORDER BY的Postgresql多列索引

Question

I have large table (100M records) with following structure. 我有一个具有以下结构的大表（100M条记录）。

  length   |          created_at
-----------+-------------------------------
 506225551 | 2018-12-29 02:08:34.116618
 133712971 | 2018-10-19 21:20:14.568936
 608443439 | 2018-12-14 03:22:55.141416
 927160571 | 2019-01-30 00:51:41.639126
 407033524 | 2018-11-16 21:26:41.523047
 506008096 | 2018-11-17 00:07:42.839919
 457719749 | 2018-11-12 02:32:53.116225

0 < length < 1000000000
'2017-01-01' < created_at < '2019-02-01'
data is evenly distributed for length and created_at . 数据按length和created_at均匀分布。

I want to run queries like this 我想运行这样的查询

SELECT * FROM tbl WHERE length BETWEEN 2000000 and 3000000 ORDER BY  created_at DESC

There are 100K results between 2000000 and 3000000, so I want to use index for selecting and for ordering. 在2000000至3000000之间有100K个结果，因此我想使用索引进行选择和排序。

I have tried these approaches 我已经尝试过这些方法

1. Simple BTREE index 1.简单的BTREE索引

create index on tbl(length);

This works well for short range for length , but I cannot use this index for ordering record. 这对于短距离的length效果很好，但是我不能使用此索引来订购记录。

2. Multicolumn BTREE index 2.多列BTREE指数

 create index on tbl(length, created_at);

This index I can use only for queries like this 该索引我只能用于这样的查询

 SELECT * FROM tbl WHERE length = 2000000 ORDER BY  created_at DESC

3. GIST index with btree_gist extension. 3.带有btree_gist扩展名的GIST索引。 I expect, that this index should work. 我希望该索引可以正常工作。

create index on tbl using gist(length, created_at);

But it didn't. 但事实并非如此。 I cannot use this index even for simple query like this. 即使对于这样的简单查询，我也无法使用此索引。

test=# explain analyze select * from gist_test where a = 345 order by c desc;

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=25706.37..25730.36 rows=9597 width=12) (actual time=4.839..5.568 rows=10000 loops=1)
   Sort Key: c DESC
   Sort Method: quicksort  Memory: 853kB
   ->  Bitmap Heap Scan on gist_test  (cost=370.79..25071.60 rows=9597 width=12) (actual time=1.402..2.869 rows=10000 loops=1)
         Recheck Cond: (a = 345)
         Heap Blocks: exact=152
         ->  Bitmap Index Scan on gist_test_a_b_c_idx  (cost=0.00..368.39 rows=9597 width=0) (actual time=1.384..1.384 rows=10000 loops=1)
               Index Cond: (a = 345)
 Planning time: 0.119 ms
 Execution time: 6.271 ms

I can use this index only as simple BTREE on one column. 我只能在一个列上将它用作简单的BTREE。

So, how can I solve this problem? 那么，我该如何解决这个问题呢？

Maybe there is noSQL databases that can process queries of this kind? 也许没有SQL数据库可以处理这种查询？

Answer 1

I do not think it is possible (at least in vanilla postgresql, I do not know an extension that could help on that). 我认为这是不可能的（至少在香草postgresql中，我不知道可以对此有所帮助的扩展名）。 The step of sorting records can be skipped only because indexes produce sorted records already. 仅因为索引已生成排序的记录，才可以跳过对记录进行排序的步骤。
However: 然而：

As mentioned in the doc , only B-tree indexes can be used for sorting (which makes sense, it is implemented using a search tree). 如文档中所述，只能将B树索引用于排序（这很有意义，它是使用搜索树实现的）。
Your where and your order by are incompatible for a B-tree index: 您的where和order by与B树索引不兼容：
- Because of having both clauses, you need to put 2 columns in the index (A, B) 由于同时具有这两个子句，因此您需要在索引(A, B)放入2列
- Data in the index is sorted by (A, B) , therefore it is also sorted by A (which is why postgresql is able to index-scan the table fast when the where is on A only), but as a consequence, it is not sorted by B in the index (it is sorted by B only within each subset where A is constant, but not across the entire table). 索引中的数据按(A, B)排序，因此也按A排序（这就是为什么postgresql仅当where在A时才能够对表进行快速索引扫描），因此，它是不排序B在索引（它是由排序B仅在每个子集，其中A是恒定的，但不是在整个表）。
- As you probably already know, having an index on B only will be of little help because of the where . 如您可能已经知道的那样，由于where仅对B进行索引将无济于事。

The provided example #2 shows postgresql is well-optimized for the case you filter on a single value of A . 提供的示例2显示了针对单个A值进行过滤的情况，PostgreSQL的优化。

If it is unacceptable to sort on the 2 columns (A, B) , then I'm afraid you shouldn't expect more than this. 如果对2列(A, B)进行排序是不可接受的，那么恐怕您不应该期望超过此值。

用于BETWEEN和ORDER BY的Postgresql多列索引

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-02 12:42:55

用于BETWEEN和ORDER BY的Postgresql多列索引

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-02 12:42:55

解决方案1
1 已采纳 2019-02-02 12:42:55