在设定的半径内从表 A 计算和返回表 B 中最接近的项目的最快方法是什么

Question

I have a very large amount of lat/long coordinates in Table 1, as well as Table 2. For example, let's say there are 100,000 coordinates in both tables.我在表 1 和表 2 中有大量的经纬度坐标。例如，假设两个表中有 100,000 个坐标。 I need to return the closest pair of coordinates in Table 2 from Table 1, as long as they are within a set minimum distance (let's say, 100 meters) for each unique item from Table 1 (up to 100,000 items, but then culled down to 100 meters is my expected output).我需要从表 1 中返回表 2 中最近的一对坐标，只要它们在表 1 中每个唯一项目（最多 100,000 个项目，但随后被剔除）的设定最小距离（比如 100 米）内到 100 米是我的预期输出）。

I am fairly familiar with the Geometry and Geography parts of MSSQL, and would traditionally approach the following with something like this:我对 MSSQL 的 Geometry 和 Geography 部分非常熟悉，并且传统上会使用以下方法处理以下问题：

Select
Table1ID = T1.ID,
Table2ID = T2.ID,
Distance = T1.PointGeog.STDistance(T2.PointGeog),
Keep = 0
into #Distance 
From #Table1 T1
   cross join #Table2 T2
where T1.PointGeog.STDistance(T2.PointGeog) <= 100

which would return all items from Table2 that are within 100 meters of Table1这将返回 Table2 中距离 Table1 100 米以内的所有项目

Then, to limit to only the closest items, I could:然后，为了仅限于最近的项目，我可以：

Update #Distance
 set Keep = 1
from #Distance D 
   inner join 
   (select shortestDist = min(Distance), Table1ID from #Distance GROUP BY 
    Table1ID) A
    on A.ID = D.Table1ID and A.shortestDist = D.Distance

and then delete anything where keep <> 1然后删除任何保留 <> 1 的内容

This works, however it takes absolutely forever.这是有效的，但是它绝对需要永远。 The cross join creates an absurd amount of calculations that SQL needs to handle, which results in ~ 9 minute queries on MSSQL 2016. I can limit the range of the portion of Table 1 and Table 2 that I compare with SOME criteria, but really not much.交叉联接创建了 SQL 需要处理的大量计算，这导致对 MSSQL 2016 的查询大约需要 9 分钟。我可以限制表 1 和表 2 与某些标准进行比较的部分的范围，但实际上不是很多。 I'm just really not sure how I could make the process quicker.我真的不知道如何让这个过程更快。 Ultimately, I just need: closest item, distance from T2 to T1.最终，我只需要：最近的项目，从 T2 到 T1 的距离。

I have played around with a few different solutions, but I wanted to see if the SO community has any additional ideas on how I could better optimize something like this.我尝试了几种不同的解决方案，但我想看看 SO 社区是否对如何更好地优化此类内容有任何其他想法。

Answer 1

Try CROSS APPLY: 尝试交叉申请：

SELECT 
    T1.ID, TT2.ID, T1.PointGeog.STDistance(TT2.PointGeog)
FROM #Table1 as T1
CROSS APPLY (SELECT TOP 1 T2.ID, T2.PointGeog 
  FROM #Table2 as T2
  WHERE T1.PointGeog.STDistance(T2.PointGeog) <= 100
  ORDER BY T1.PointGeog.STDistance(T2.PointGeog) ASC) as TT2

Answer 2

I have played around with a new option, and I think this is the fastest I have gotten the calculation - to about 3 minutes. 我尝试了一个新选项，我认为这是我计算得出的最快速度，大约需要3分钟。

I changed Table1 to be: 我将表1更改为：

select
ID,
PointGeog,
Buffer = PointGeom.STBuffer(8.997741566866716e-4)
into #Table1

where the buffer is 100/111139 (convert degrees to meters) 缓冲区为100/111139（将度转换为米）

and then 接着

if object_id('tempdb.dbo.#Distance') is not null
drop table #Distance 
Select 
T1ID = T1.ID,
T1Geog = T1.PointGeog,
T2ID = T2.ID,
T2Geog = T2.PointGeog,
DistanceMeters = cast(null as float),
DistanceMiles = cast(null as float),
Keep = 0
Into #Distance
From #Table1 T1
    cross join #Table2 T2
Where T1.Buffer.STIntersects(T2.PointGeom) = 1

which does not calculate the distance, but first culls the dataset to anything within 100 meters. 它不会计算距离，但会首先将数据集剔除到100米以内的任何内容。 I can then pass an update statement to calculate the distance on a substantially more manageable dataset. 然后，我可以传递一条update语句来计算实质上更易于管理的数据集上的距离。

Answer 3

Create a spatial index on geom column on both tables and it shouldn't be too bad performance.在两个表的 geom 列上创建空间索引，性能应该不会太差。 Something like:就像是：


CREATE SPATIAL INDEX spat_t ON  [#Table1]
    (
        [PointGeog]
    )

I ran some tests with 100k dots on my laptop and it took 3 minutes to "join"我在笔记本电脑上用 10 万个点进行了一些测试，“加入”花了 3 分钟

在设定的半径内从表 A 计算和返回表 B 中最接近的项目的最快方法是什么

问题描述

3 个解决方案

解决方案1
0 2019-02-13 15:18:46

解决方案2
0 2019-02-13 15:27:49

解决方案3
0 2021-06-23 19:22:12

在设定的半径内从表 A 计算和返回表 B 中最接近的项目的最快方法是什么

问题描述

3 个解决方案

解决方案1 0 2019-02-13 15:18:46

解决方案2 0 2019-02-13 15:27:49

解决方案3 0 2021-06-23 19:22:12

解决方案1
0 2019-02-13 15:18:46

解决方案2
0 2019-02-13 15:27:49

解决方案3
0 2021-06-23 19:22:12