简体   繁体   English

如何测量单独数据框中点之间的距离?

[英]How to measure distance between points in separate data frames?

I created 2 data frames with geom columns (of POINT type).我用 geom 列(POINT 类型)创建了 2 个数据框。 Now I would like to calculate distance between each pair of points, eg point from 1st row in first df with point from 1st row in second df etc. Here are my data frames:现在我想计算每对点之间的距离,例如第一个df中第一行的点与第二个df中第一行的点等。这是我的数据框:

df1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = 4326)

df2 <- table %>%
  st_as_sf(coords = c("lonApp","latApp"), crs = 4326)

I used st_distance :我用st_distance

distance <- st_distance(df1$geometry,df2$geometry)

but I got a matrix where distance is calculated for each-each pair from both geom columns:但我得到了一个矩阵,其中计算了两个 geom 列中每一对的距离:

           [,1]      [,2]        [,3]         [,4]        [,5]  ...
[1,]   139.7924 7735.5718 15225.02995   558.104089  1016.58121
[2,]  8503.0544  755.2915  8764.75396  7957.289600  8788.02800
[3,] 15306.5855 9336.9008    18.96914 14876.589918 15929.51643
[4,]   548.3045 7232.0164 14898.70637     8.094068  1078.38236
[5,]   911.5635 8084.3086 15993.36365  1127.730022    46.97799
.
.

I wanted distance to be calculated in one column, only between corresponding geom rows:我希望在一列中计算距离,仅在相应的几何行之间:

           [,1]     
[1,]   139.7924 
[2,]  8503.0544
[3,] 15306.5855 
[4,]   548.3045
[5,]   911.5635
.
.

I read about geosphere package but sf has very nice st_distance function to measure distance, I wanted to use it.我读到了geosphere package 但sf有非常好的st_distance function 来测量距离,我想用它。 And most importantly, do I need first to join those data frames?最重要的是,我需要先加入这些数据框吗? Simple inner_join from dplyr doesn't allow to join two spatial data frames, st_join on the other hand is not an option for me here bacause I don't want to join by geometries (geometries in two data frames are totally different)来自dplyr的简单inner_join不允许加入两个空间数据帧,另一方面st_join对我来说不是一个选项,因为我不想通过几何加入(两个数据帧中的几何完全不同)

As @mrhellmann mentioned, you could just add by_element=T and that should work.正如@mrhellmann 提到的,您可以添加by_element=T并且应该可以。 If speed is still an issue, I recommend using the DistGeo() from the geosphere package.如果速度仍然是一个问题,我建议使用 Geosphere package 中的geosphere DistGeo() But be sure to look at the documentation to see that your data is appropriate for this function.但请务必查看文档以查看您的数据是否适合此 function。

library(geosphere)
library(tidyverse)
library(sf)

df1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = 4326)

doParallel::registerDoParallel()
df_crs4326 <- df1 %>%
  group_by(your_id_here) %>% 
  mutate(
    lonCust = map(geometry, 2) %>% unlist(),
    latCust= map(geometry, 1) %>% unlist(),
    # geometry_2 = st_as_sfc(coords = c("lonApp","latApp"), crs = 4326)
    ) %>%
  mutate(
    distance_to_next = distGeo(c(lonCust, latCust), c(lonApp, latApp)) %>% set_units(m),
    # distance_2 = st_distance(geometry, geometry_2, by_element = TRUE)
    ) %>%
    ungroup()

Note that I am not sure the commented out parts work without testing on reproducible data.请注意,如果没有对可重现数据进行测试,我不确定被注释掉的部分是否有效。

Super Fast Vectorised Computation超快速向量化计算

This method works by:此方法通过以下方式起作用:

  1. Projecting the (longitude, latitude) coordinates to a relevant coordinate system that is equidistant for your region of interest.将(经度、纬度)坐标投影到与您感兴趣的区域等距的相关坐标系。 (An equidistant coordinate system preserves distance measurements between points, so you can just use basic geometry to calculate distances). (等距坐标系保留点之间的距离测量值,因此您可以只使用基本几何来计算距离)。
  2. Convert the geometries to a Base R metrix with X and Y columns.将几何图形转换为具有 X 和 Y 列的 Base R 矩阵。
  3. Finally, simply use Pythagoras's theorem to calculate the distance between pairs of points.最后,简单地使用毕达哥拉斯定理来计算点对之间的距离。

Get the Coordinate Reference System (CRS) right first首先获取坐标参考系 (CRS)

For this to work, you need an equidistant CRS.为此,您需要一个等距的 CRS。 This means that, across an area of interest, any distance calculations are preserved.这意味着,在感兴趣的区域内,任何距离计算都会被保留。

Let's say that you were interested in calculating distances across the USA, you could use EPSG:102005 .假设您对计算美国的距离感兴趣,您可以使用EPSG:102005 See this GIS answer for mode details.有关模式详细信息,请参阅此 GIS 答案。 The choice of CRS here is crucial, so make sure you get this right, else the answer will be nonsense.这里 CRS 的选择至关重要,所以请确保你做对了,否则答案将是无稽之谈。

Applied to your example应用于您的示例

crs.source = 4326
crs.dest = st_crs("+proj=eqdc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")

# coords1 and coords2 are matrixes with columns X and Y and rows of points in the `crs.dest` coordinate system.
coords1 <- table %>%
  st_as_sf(coords = c("lonCust","latCust"), crs = crs.source) %>%
  st_transform(crs.dest) %>%
  st_coordinates()
  
coords2 <- table %>%
  st_as_sf(coords = c("lonApp","latApp"), crs = crs.source) %>%
  st_transform(crs.dest) %>%
  st_coordinates()

# This is a vectorised computation, and so should be instant for a mere 25,000 rows :-)
table$distances = local({
  x_diff = coords1[, 'X'] - coords2[, 'X']
  y_diff = coords1[, 'Y'] - coords2[, 'Y']
  return(sqrt(x^2 + y^2))
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM