简体   繁体   English

优化 postgis 查询 - 为什么不使用第二个索引?

[英]Optimizing a postgis query - why is 2nd index not being used?

We have a table with tens of millions of polygons and we have this index:我们有一个包含数千万个多边形的表,并且我们有这个索引:

CREATE INDEX IF NOT EXISTS polygons_geog_idx ON polygons USING GIST(geog);

That let us query the DB really efficiently, like so:这让我们可以真正有效地查询数据库,如下所示:

SELECT * FROM polygons WHERE st_dwithin('SRID=4326;POINT(-1 50)'::geography, geog, 500);

Now due to the business requirements, we need to return only biggest 200 polygons.现在由于业务需求,我们只需要返回最大的 200 个多边形。 Easily doable like with:很容易做到:

  • LIMIT 200
  • ORDER BY st_area(geog)

Full Query: SELECT gid, st_area(geog) as size FROM polygons WHERE st_dwithin(geog, 'SRID=4326;POINT(-1 50)'::geography, 500) ORDER BY st_area(geog) DESC LIMIT 200 .完整查询: SELECT gid, st_area(geog) as size FROM polygons WHERE st_dwithin(geog, 'SRID=4326;POINT(-1 50)'::geography, 500) ORDER BY st_area(geog) DESC LIMIT 200

Because of the order by and select our query slows down by 10x.由于order byselect ,我们的查询速度减慢了 10 倍。 I thought it will be easily fixable by adding another index like seen in this SO Answer : CREATE INDEX polygons_geog_area_idx ON polygons (st_area(geog));我认为通过添加另一个索引(如在此SO 答案中看到的)可以轻松修复: CREATE INDEX polygons_geog_area_idx ON polygons (st_area(geog));

But polygons_geog_area_idx doesn't seem to be picked up:但是polygons_geog_area_idx似乎没有被选中:

Sort  (cost=8.23..8.23 rows=1 width=12) (actual time=133.755..142.427 rows=2325 loops=1)
  Sort Key: (st_area(geog, true))
  Sort Method: quicksort  Memory: 205kB
  ->  Index Scan using polygons_geog_idx on polygons  (cost=0.14..8.22 rows=1 width=12) (actual time=0.468..121.974 rows=2325 loops=1)
        Index Cond: (geog && '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography)
        Filter: (('0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography && _st_expand(geog, '500'::double precision)) AND _st_dwithin(geog, '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography, '500'::double precision, true))
        Rows Removed by Filter: 3
Planning Time: 0.157 ms
Execution Time: 151.196 ms

(note: this is on development dataset, much smaller than actual dataset this will run on later) (注意:这是在开发数据集上,比稍后运行的实际数据集要小得多)

What am I missing?我错过了什么? Can you even use 2 indexes like I want?你甚至可以像我想要的那样使用 2 个索引吗?

PostgreSQL cannot combine two indexes in this way, one for the order and one for selectivity. PostgreSQL 不能这样组合两个索引,一个用于顺序,一个用于选择性。

In order to sort by the area, it first needs to compute the area.为了按面积排序,它首先需要计算面积。 The sort itself is fast (taking only 15% of the time) so it must be the computation of the area which is slow.排序本身很快(只占用 15% 的时间),所以一定是区域的计算速度很慢。 An EXPLAIN VERBOSE suggests to me that the computation of the area is done as part of the index scan and then the result passed up to the sort, rather than being done in the sort itself. EXPLAIN VERBOSE向我表明该区域的计算是作为索引扫描的一部分完成的,然后将结果传递给排序,而不是在排序本身中完成。 So it makes sense that the timing of doing this would be attributed to the index scan.因此,这样做的时机将归因于索引扫描是有道理的。

To improve the time needed to compute the area, you could compute and store it as part of the table.为了缩短计算面积所需的时间,您可以将其计算并存储为表的一部分。 The best way to do that (with new enough version) is with a generated column.最好的方法(使用足够新的版本)是使用生成的列。

alter table polygons add polygon_area double precision generated always as (st_area(geog)) stored;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM