简体   繁体   English

RAILS中的POSTGIS查询优化

[英]POSTGIS query optimization in RAILS

I am using postgis2.0 with postgres 9.1 db. 我正在将postgis2.0与postgres 9.1 db一起使用。 My goal is to write as close as possible to an optimized query to get nearby locations within certain radius and out put them with their distance order. 我的目标是编写尽可能接近优化查询的查询,以获取特定半径内的附近位置,并按照距离顺序将其放置。 The Location model has attribute latlong which of spatial type postgis extension and a method distance_from to calculate distance from a given POINT(long lat). 位置模型具有属性latlong (空间类型为postgis扩展)和方法distance_from用于计算与给定POINT(long lat)的距离。 I wrote query as follow in rails code: 我在rails代码中编写了以下查询:

def self.nearby(lat, long, radius)
    nearby = Location.where("ST_DWithin(ST_GeomFromEWKB(latlong), ST_GeomFromText('POINT(#{long} #{lat})', 4326),?, false )", radius)
    .order("ST_Distance_Sphere(ST_GeomFromEWKB(latlong) , ST_GeomFromText('POINT(#{long} #{lat})', 4326) ) ")
    .map{|ar| 
      { "id" => ar.id,
        "distance" => ar.distance_from(lat, long)
      } 
    }
end

I can see that i double calculate the distance twice with order clauses and map clause but can't think of how should i store immediate value of distance from sql query. 我可以看到我使用order子句和map子句两次计算distance两次,但是想不到应该如何存储sql查询的距离的立即值。 So in the map{} i recalculate it. 因此,我在map{}重新计算了它。

 `.order("ST_Distance_Sphere(ST_GeomFromEWKB(latlong) , ST_GeomFromText('POINT(#{long} #{lat})', 4326) ) ")`

"distance" => ar.distance_from(lat, long)

If I am not wrong, using ST_DWithin in my case could help me get answer quickly whether a location is within rather than calculate distance first. 如果我没有记错,在我的情况下使用ST_DWithin可以帮助我快速获取位置是否在范围内的答案,而不是先计算距离。 So if say one query would return only 10-100 locations, ST_DWithin will help speed up query than purely use STDistance. 因此,如果说一个查询仅返回10-100个位置,则ST_DWithin比单纯使用STDistance可以帮助加快查询速度。

How much more can I improve? 我还能提高多少? My locations db size will be around 10000 records. 我的locations数据库大小约为10000条记录。 Appreciate your time, thanks. 感谢您的时间,谢谢。

At the moment I'm also working at an application using Rails & PostGIS. 目前,我还在使用Rails和PostGIS开发应用程序。 :-) :-)

For complex queries I chose the way to write plain SQL instead of using ActiveRecords methods, makes things a bit more easy to maintain. 对于复杂的查询,我选择了编写普通SQL的方法,而不是使用ActiveRecords方法,这使事情更易于维护。

Yours is: 您的是:

SELECT
  *
FROM location
WHERE
  ST_DWithin(ST_GeomFromEWKB(latlong),
    ST_GeomFromText('POINT(#{long} #{lat})', 4326), ?, false)
ORDER BY
  ST_Distance_Sphere(ST_GeomFromEWKB(latlong),
    ST_GeomFromText('POINT(#{long} #{lat})', 4326))

By the way, those coordinates are called latlon without the g ;-) 顺便说一下,那些坐标称为不带g latlon ;-)

Give me a few minutes I'll try to figure out how Postgres will optimize your query and if it's needed to optimize it by hand. 请花几分钟,我将尝试弄清楚Postgres如何优化您的查询以及是否需要手动优化查询。


This query can be faster (If there are a lot of matches), but can also be slower, because ST_DWithin is much faster than ST_Distance or ST_Distance_Sphere . 该查询可以更快(如果有很多匹配项),但是也可以更慢,因为ST_DWithinST_DistanceST_Distance_Sphere快得多。 So please test it with a huge amount of data: 因此,请使用大量数据进行测试:

SELECT
  *
FROM ( 
  SELECT
    l.*,
    (
      ST_DISTANCE_SPHERE(ST_GeomFromEWKB(latlong),
        ST_GeomFromText('POINT(#{long} #{lat})', 4326))
    ) AS d
  FROM location l
) x
WHERE d < ?
ORDER BY d

Explanation: 说明:

Your original query will first filter the results using the fast ST_DWithin and afterwards call ST_Distance_Sphere for all found objects. 您的原始查询将首先使用快速ST_DWithin过滤结果,然后对所有找到的对象调用ST_Distance_Sphere

My query will calculate ST_Distance_Sphere for ALL objects in database, and afterwards filter them using an integer comparison. 我的查询将为数据库中的所有对象计算ST_Distance_Sphere ,然后使用整数比较对其进行过滤。


For use in Rails, you might simply call Location.find_by_sql(...) 为了在Rails中使用,您可以简单地调用Location.find_by_sql(...)


EXPLAIN ANALYZE 解释分析

(my table is called measurement and the column containing the Point is called groundtruth ) (我的表称为measurement ,而包含Point的列称为groundtruth

Your query: 您的查询:

Sort  (cost=341.05..341.06 rows=1 width=172) (actual time=3.676..3.731 rows=816 loops=1)
  Sort Key: (_st_distance(geography(groundtruth), '0101000020E6100000EE7C3F355EF24F4019390B7BDA011940'::geography, 0::double precision, false))
  Sort Method: quicksort  Memory: 139kB
  ->  Bitmap Heap Scan on measurement m  (cost=9.67..341.04 rows=1 width=172) (actual time=0.330..3.257 rows=816 loops=1)
        Recheck Cond: (groundtruth && '01030000000100000005000000EE7C3F355E724D4064E42CEC6907F43FEE7C3F355E724D408C9C853DED80264077BE9F1A2F3951408C9C853DED80264077BE9F1A2F39514064E42CEC6907F43FEE7C3F355E724D4064E42CEC6907F43F'::geometry)
        Filter: (('0101000000EE7C3F355EF24F4019390B7BDA011940'::geometry && st_expand(groundtruth, 5::double precision)) AND _st_dwithin(groundtruth, '0101000000EE7C3F355EF24F4019390B7BDA011940'::geometry, 5::double precision))
        ->  Bitmap Index Scan on groundtruth_idx  (cost=0.00..9.67 rows=189 width=0) (actual time=0.186..0.186 rows=855 loops=1)
              Index Cond: (groundtruth && '01030000000100000005000000EE7C3F355E724D4064E42CEC6907F43FEE7C3F355E724D408C9C853DED80264077BE9F1A2F3951408C9C853DED80264077BE9F1A2F39514064E42CEC6907F43FEE7C3F355E724D4064E42CEC6907F43F'::geometry)
Total runtime: 3.932 ms

My query: 我的查询:

Sort  (cost=9372.84..9391.92 rows=7634 width=172) (actual time=19.256..19.312 rows=816 loops=1)
  Sort Key: (st_distance(m.groundtruth, '0101000000EE7C3F355EF24F4019390B7BDA011940'::geometry))
  Sort Method: quicksort  Memory: 139kB
  ->  Seq Scan on measurement m  (cost=0.00..8226.01 rows=7634 width=172) (actual time=0.040..18.863 rows=816 loops=1)
        Filter: (st_distance(groundtruth, '0101000000EE7C3F355EF24F4019390B7BDA011940'::geometry) < 5::double precision)
Total runtime: 19.396 ms

As you can see: There were just 816 matching rows from 22901. And my query took much longer. 如您所见:22901中只有816个匹配的行。我的查询花费了更长的时间。

If I make the distance bigger, both queries become equal fast. 如果我加大距离,两个查询将变得相等。

If all rows (= 22901 rows) are within the search radius, my query is a little bit faster: 180 vs. 210ms. 如果所有行(= 22901行)都在搜索半径之内,我的查询会快一点:180毫秒对210毫秒。

So you'd probably stay with your solution ;) 因此,您可能会坚持使用您的解决方案;)


Another suggestion to maybe gain 1-2% performance: Don't use GeomFromText, you could just use rgeo to directly give your database a Point object as parameter, instead of 2 coordinates. 另一个建议可能会获得1-2%的性能:不要使用GeomFromText,您可以使用rgeo直接为数据库提供一个Point对象作为参数,而不是2个坐标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM