[英]Postgresql multicolumn index for BETWEEN and ORDER BY
I have large table (100M records) with following structure. 我有一个具有以下结构的大表(100M条记录)。
length | created_at
-----------+-------------------------------
506225551 | 2018-12-29 02:08:34.116618
133712971 | 2018-10-19 21:20:14.568936
608443439 | 2018-12-14 03:22:55.141416
927160571 | 2019-01-30 00:51:41.639126
407033524 | 2018-11-16 21:26:41.523047
506008096 | 2018-11-17 00:07:42.839919
457719749 | 2018-11-12 02:32:53.116225
0 < length < 1000000000
'2017-01-01' < created_at < '2019-02-01'
length
and created_at
. length
和created_at
均匀分布。 I want to run queries like this 我想运行这样的查询
SELECT * FROM tbl WHERE length BETWEEN 2000000 and 3000000 ORDER BY created_at DESC
There are 100K results between 2000000 and 3000000, so I want to use index for selecting and for ordering. 在2000000至3000000之间有100K个结果,因此我想使用索引进行选择和排序。
I have tried these approaches 我已经尝试过这些方法
1. Simple BTREE index 1.简单的BTREE索引
create index on tbl(length);
This works well for short range for length
, but I cannot use this index for ordering record. 这对于短距离的
length
效果很好,但是我不能使用此索引来订购记录。
2. Multicolumn BTREE index 2.多列BTREE指数
create index on tbl(length, created_at);
This index I can use only for queries like this 该索引我只能用于这样的查询
SELECT * FROM tbl WHERE length = 2000000 ORDER BY created_at DESC
3. GIST index with btree_gist
extension. 3.带有
btree_gist
扩展名的GIST索引。 I expect, that this index should work. 我希望该索引可以正常工作。
create index on tbl using gist(length, created_at);
But it didn't. 但事实并非如此。 I cannot use this index even for simple query like this.
即使对于这样的简单查询,我也无法使用此索引。
test=# explain analyze select * from gist_test where a = 345 order by c desc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=25706.37..25730.36 rows=9597 width=12) (actual time=4.839..5.568 rows=10000 loops=1)
Sort Key: c DESC
Sort Method: quicksort Memory: 853kB
-> Bitmap Heap Scan on gist_test (cost=370.79..25071.60 rows=9597 width=12) (actual time=1.402..2.869 rows=10000 loops=1)
Recheck Cond: (a = 345)
Heap Blocks: exact=152
-> Bitmap Index Scan on gist_test_a_b_c_idx (cost=0.00..368.39 rows=9597 width=0) (actual time=1.384..1.384 rows=10000 loops=1)
Index Cond: (a = 345)
Planning time: 0.119 ms
Execution time: 6.271 ms
I can use this index only as simple BTREE on one column. 我只能在一个列上将它用作简单的BTREE。
So, how can I solve this problem? 那么,我该如何解决这个问题呢?
Maybe there is noSQL databases that can process queries of this kind? 也许没有SQL数据库可以处理这种查询?
I do not think it is possible (at least in vanilla postgresql, I do not know an extension that could help on that). 我认为这是不可能的(至少在香草postgresql中,我不知道可以对此有所帮助的扩展名)。 The step of sorting records can be skipped only because indexes produce sorted records already.
仅因为索引已生成排序的记录,才可以跳过对记录进行排序的步骤。
However: 然而:
where
and your order by
are incompatible for a B-tree index: where
和order by
与B树索引不兼容:
(A, B)
(A, B)
放入2列 (A, B)
, therefore it is also sorted by A
(which is why postgresql is able to index-scan the table fast when the where
is on A
only), but as a consequence, it is not sorted by B
in the index (it is sorted by B
only within each subset where A
is constant, but not across the entire table). (A, B)
排序,因此也按A
排序(这就是为什么postgresql仅当where
在A
时才能够对表进行快速索引扫描),因此,它是不排序B
在索引(它是由排序B
仅在每个子集,其中A
是恒定的,但不是在整个表)。 B
only will be of little help because of the where
. where
仅对B
进行索引将无济于事。 The provided example #2 shows postgresql is well-optimized for the case you filter on a single value of A
. 提供的示例2显示了针对单个
A
值进行过滤的情况,PostgreSQL的优化。
If it is unacceptable to sort on the 2 columns (A, B)
, then I'm afraid you shouldn't expect more than this. 如果对2列
(A, B)
进行排序是不可接受的,那么恐怕您不应该期望超过此值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.