I have a very large amount of lat/long coordinates in Table 1, as well as Table 2. For example, let's say there are 100,000 coordinates in both tables. I need to return the closest pair of coordinates in Table 2 from Table 1, as long as they are within a set minimum distance (let's say, 100 meters) for each unique item from Table 1 (up to 100,000 items, but then culled down to 100 meters is my expected output).
I am fairly familiar with the Geometry and Geography parts of MSSQL, and would traditionally approach the following with something like this:
Select
Table1ID = T1.ID,
Table2ID = T2.ID,
Distance = T1.PointGeog.STDistance(T2.PointGeog),
Keep = 0
into #Distance
From #Table1 T1
cross join #Table2 T2
where T1.PointGeog.STDistance(T2.PointGeog) <= 100
which would return all items from Table2 that are within 100 meters of Table1
Then, to limit to only the closest items, I could:
Update #Distance
set Keep = 1
from #Distance D
inner join
(select shortestDist = min(Distance), Table1ID from #Distance GROUP BY
Table1ID) A
on A.ID = D.Table1ID and A.shortestDist = D.Distance
and then delete anything where keep <> 1
This works, however it takes absolutely forever. The cross join creates an absurd amount of calculations that SQL needs to handle, which results in ~ 9 minute queries on MSSQL 2016. I can limit the range of the portion of Table 1 and Table 2 that I compare with SOME criteria, but really not much. I'm just really not sure how I could make the process quicker. Ultimately, I just need: closest item, distance from T2 to T1.
I have played around with a few different solutions, but I wanted to see if the SO community has any additional ideas on how I could better optimize something like this.
Try CROSS APPLY:
SELECT
T1.ID, TT2.ID, T1.PointGeog.STDistance(TT2.PointGeog)
FROM #Table1 as T1
CROSS APPLY (SELECT TOP 1 T2.ID, T2.PointGeog
FROM #Table2 as T2
WHERE T1.PointGeog.STDistance(T2.PointGeog) <= 100
ORDER BY T1.PointGeog.STDistance(T2.PointGeog) ASC) as TT2
I have played around with a new option, and I think this is the fastest I have gotten the calculation - to about 3 minutes.
I changed Table1 to be:
select
ID,
PointGeog,
Buffer = PointGeom.STBuffer(8.997741566866716e-4)
into #Table1
where the buffer is 100/111139 (convert degrees to meters)
and then
if object_id('tempdb.dbo.#Distance') is not null
drop table #Distance
Select
T1ID = T1.ID,
T1Geog = T1.PointGeog,
T2ID = T2.ID,
T2Geog = T2.PointGeog,
DistanceMeters = cast(null as float),
DistanceMiles = cast(null as float),
Keep = 0
Into #Distance
From #Table1 T1
cross join #Table2 T2
Where T1.Buffer.STIntersects(T2.PointGeom) = 1
which does not calculate the distance, but first culls the dataset to anything within 100 meters. I can then pass an update statement to calculate the distance on a substantially more manageable dataset.
Create a spatial index on geom column on both tables and it shouldn't be too bad performance. Something like:
CREATE SPATIAL INDEX spat_t ON [#Table1]
(
[PointGeog]
)
I ran some tests with 100k dots on my laptop and it took 3 minutes to "join"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.