I would like to retrieve all points within a given range of another set of points. Let's say, find all shops within 500m of any subway station.
I wrote this query, which is quite slow, and would like to optimize it:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);
I'm running on latest versions of Postgres and PostGis (Postgres 9.5, PostGis 2.2.1)
Here is the table metadata:
Table "public.locations"
Column | Type | Modifiers
--------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
coordinates | geometry |
Indexes:
"locations_coordinates_index" gist (coordinates)
Table "public.pois"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('pois_id_seq'::regclass)
coordinates | geometry |
poi_kind_id | integer |
Indexes:
"pois_pkey" PRIMARY KEY, btree (id)
"pois_coordinates_index" gist (coordinates)
"pois_poi_kind_id_index" btree (poi_kind_id)
Foreign-key constraints:
"pois_poi_kind_id_fkey" FOREIGN KEY (poi_kind_id) REFERENCES poi_kinds(id)
Here is the result of EXPLAIN (ANALYZE, BUFFERS):
Unique (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.080..3338.252 rows=918 loops=1)
Buffers: shared hit=559
-> Sort (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.079..3338.145 rows=963 loops=1)
Sort Key: locations.id
Sort Method: quicksort Memory: 70kB
Buffers: shared hit=559
-> Nested Loop (cost=0.00..2407390.71 rows=2 width=4) (actual time=2.466..3337.835 rows=963 loops=1)
Join Filter: (((pois.coordinates)::geography && _st_expand((locations.coordinates)::geography, 500::double precision)) AND ((locations.coordinates)::geography && _st_expand((pois.coordinates)::geography, 500::double precision)) AND _st_dwithin((pois.coordinates)::geography, (locations.coordinates)::geography, 500::double precision, false))
Rows Removed by Join Filter: 4531356
Buffers: shared hit=559
-> Seq Scan on locations (cost=0.00..791.68 rows=24168 width=36) (actual time=0.005..3.100 rows=24237 loops=1)
Buffers: shared hit=550
-> Materialize (cost=0.00..10.47 rows=187 width=32) (actual time=0.000..0.009 rows=187 loops=24237)
Buffers: shared hit=6
-> Seq Scan on pois (cost=0.00..9.54 rows=187 width=32) (actual time=0.015..0.053 rows=187 loops=1)
Filter: (poi_kind_id = 3)
Rows Removed by Filter: 96
Buffers: shared hit=6
Planning time: 0.184 ms
Execution time: 3338.304 ms
(20 rows)
I think that you are using the geography version of st_dwithin, because of the fourth parameter.
Try changing your query to this one:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500);
If it doesn't solve, please post the explain analyze again.
I eventually came to the conclusion I could not live-compute distance between thousands of point of interest and thousands of locations within a realistic amount of time (< 1sec).
So instead I precompute everything: each time a location or a POI is created/updated, I store the minimum distance between each location and each kind of POI in order to be able to answer the question "which locations are closer than X meters from this kind of POI".
Here is the module I coded for this purpose (it's in Elixir, but the main part is raw SQL)
defmodule My.POILocationDistanceService do
alias Ecto.Adapters.SQL
alias My.Repo
def delete_distance_for_location(location_id) do
run_query!("DELETE FROM poi_location_distance WHERE location_id = $1::integer", [location_id])
end
def delete_distance_for_poi_kind(poi_kind_id) do
run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = $1::integer", [poi_kind_id])
end
def insert_distance_for_location(location_id) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON (p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM locations l, pois p
WHERE
l.id = $1
AND ST_DWithin(l.coordinates, p.coordinates, $2, FALSE)
GROUP BY p.poi_kind_id, p.id, l.id
ORDER BY p.poi_kind_id, distance;
"""
run_query!(sql, [location_id, max_distance])
end
def insert_distance_for_poi_kind(poi_kind_id, offset \\ 0, limit \\ 10_000_000) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON(l.id, p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM pois p, (SELECT * FROM locations OFFSET $1 LIMIT $2) as l
WHERE
p.poi_kind_id = $3
AND ST_DWithin(l.coordinates, p.coordinates, $4, FALSE)
GROUP BY l.id, p.poi_kind_id, p.id;
"""
run_query!(sql, [offset, limit, poi_kind_id, max_distance])
end
defp run_query!(query, params) do
SQL.query!(Repo, query, params)
end
def max_distance, do: 5000
end
I think you should change a solution, postgis is still running a query in Structured Database, it is powerful, but not quick in special requirement,may be you need elasticsearch.
elasticsearch is good at Geo location search ,but not good at Geo data process,I think you need them both.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.