简体   繁体   中英

PostgreSQL - Calculate the minimum distance between two points

I have a fairly substantial point layer (just over 1 million), and I would like to select the shortest distance separating each point from this same layer to another (nearest neighbor). After some research on the internet, I turned to the Cross Join Lateral clause.

However, the request never ends (more than 5 hours without finalization). I compared with the QGis Distance Matrix, and there the computation time seems to be much faster (around 10% every 5 minutes). I tell myself that the cause may lie in the poorly formulated request.

Here is the code i used:

with couche_points as (select * from public.centroides_batis_all)
select p.id, t.id_2, t.dist
from couche_points p cross join lateral(
select r.id as id_2, p.geom <-> r.geom as dist
from couche_points r
where p.id <> r.id
order by p.geom <-> r.geom
limit 1) as t

However, everything looks good to me. Is there a difference in performance between PostGis and QGis?

Thank you.

What is the point of the dummy CTE? All it does is defeat the usage of any index on the real table (which is surely ample explanation for the slowness)

select p.id, t.id_2, t.dist
from centroides_batis_all p cross join lateral(
select r.id as id_2, p.geom <-> r.geom as dist
from centroides_batis_all r
where p.id <> r.id
order by p.geom <-> r.geom
limit 1) as t;

As I can see you are building matrix like this in your query:

   p1  p2  p3  p4  ... pn
p1 --- d21 d31 d41 ... dn1
p2 d12 --- d32 d42 ... dn2
p3 d13 d23 --- d43 ... dn3
p4 d14 d24 d34 --- ... dn4
..........................
pn d1n d2n d3n d4n ... ---

but actually you need only half of it because left bottom half just duplicates top right half with points swapping:

   p1  p2  p3  p4  ... pn
p1 --- d21 d31 d41 ... dn1
p2 --- --- d32 d42 ... dn2
p3 --- --- --- d43 ... dn3
p4 --- --- --- --- ... dn4
..........................
pn --- --- --- --- ... ---
select t1.id as id, t2.id as id_2, t2.dist
from
  centroides_batis_all as t1 cross join lateral (
    select t2.id, t1.geom <-> t2.geom as dist
    from centroides_batis_all as t2 where t1.id < t2.id -- the main difference here
    order by dist limit 1) as t2;

This query will return pairs such as p1-p2 but not p2-p1 (with same distance of course)

To fix this you could just duplicate rows from the previous query with swapped points:

with cte as (
  select t1.id as id, t2.id as id_2, t2.dist
  from
    centroides_batis_all as t1 cross join lateral (
      select t2.id, t1.geom <-> t2.geom as dist
      from centroides_batis_all as t2 where t1.id < t2.id
      order by dist limit 1) as t2)
select
  case t.n when 1 then cte.id else cte.id_2 end as id,
  case t.n when 1 then cte.id_2 else cte.id end as id_2,
  cte.dist
from cte, (values(1), (2)) as t(n);

Maybe you have got the option to split your points in four areas.

1 Distance Matrix with 1.000.000 Points needs 1.000.000 x 1.000.000 = 1.000.000.000.000 calculations.

4 Distance MAtrix with 250.000 Points needs 250.000 x 250.000 = 250.000.000.0000 calculations.

Thats just 1/4 of calculations. Of Course you have to show how to handle where the spliited areas come together, but it seems to be much quicker.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM