[英]How loop over columns and calculate distance using lat and long in R
I have a dataframe with lat and long of various areas in a city.我有一个数据框,其中包含城市中各个区域的纬度和经度。
A subset of the dataframe:数据帧的一个子集:
structure(list(Locality = c("ADYAR", "AMBATTUR", "KOLATHUR",
"AVADI", "AGARAM", "ANNA NAGAR WEST", "CHROMPET", "MADIPAKKAM",
"MOGAPPAIR", "MYLAPORE"), Transactions = c(607, 569, 498, 409,
103, 257, 303, 343, 316, 205), lon = c(80.2564957, 80.1547844,
80.2121332, 80.0969511, 80.2294222, 80.2017906, 80.1461663, 80.1960832,
80.1749627, 80.2676303), lat = c(13.0011774, 13.1143393, 13.1239583,
13.1067448, 13.1116221, 13.0861782, 12.951611, 12.9647462, 13.0837224,
13.0367914), Ambatturlon = c(80.15478, 80.15478, 80.15478, 80.15478,
80.15478, 80.15478, 80.15478, 80.15478, 80.15478, 80.15478),
Ambatturlat = c(13.11434, 13.11434, 13.11434, 13.11434, 13.11434,
13.11434, 13.11434, 13.11434, 13.11434, 13.11434), Guindylon = c(80.22064,
80.22064, 80.22064, 80.22064, 80.22064, 80.22064, 80.22064,
80.22064, 80.22064, 80.22064), Guindylat = c(13.00666, 13.00666,
13.00666, 13.00666, 13.00666, 13.00666, 13.00666, 13.00666,
13.00666, 13.00666), OMRlon = c(80.22915, 80.22915, 80.22915,
80.22915, 80.22915, 80.22915, 80.22915, 80.22915, 80.22915,
80.22915), OMRlat = c(12.91261, 12.91261, 12.91261, 12.91261,
12.91261, 12.91261, 12.91261, 12.91261, 12.91261, 12.91261
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
>
> df
# A tibble: 10 x 10
Locality Transactions lon lat Ambatturlon Ambatturlat Guindylon Guindylat OMRlon OMRlat
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ADYAR 607 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9
2 AMBATTUR 569 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9
3 KOLATHUR 498 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9
4 AVADI 409 80.1 13.1 80.2 13.1 80.2 13.0 80.2 12.9
5 AGARAM 103 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9
6 ANNA NAGAR WEST 257 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9
7 CHROMPET 303 80.1 13.0 80.2 13.1 80.2 13.0 80.2 12.9
8 MADIPAKKAM 343 80.2 13.0 80.2 13.1 80.2 13.0 80.2 12.9
9 MOGAPPAIR 316 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9
10 MYLAPORE 205 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9
>
Columns Ambatturlon, Ambatturlat, Guindylon etc are localities within the same city.列 Ambatturlon、Ambatturlat、Guindylon 等是同一城市内的地方。 I need to calculate the distance between each locality and the other localities as mentioned in the columns: (Ambatturlon, Ambatturlat), (Guindylon Guindylat), (OMRlon OMRlat).
我需要计算列中提到的每个地点与其他地点之间的距离:(Ambatturlon, Ambatturlat), (Guindylon Guindylat), (OMRlon OMRlat)。
I learnt that we can use distHaversine function from geosphere package for this.我了解到我们可以为此使用 geosphere 包中的 distHaversine 函数。
I tried it for first locality using below code:我使用以下代码在第一个地点尝试了它:
> df %>%
+ rowwise() %>%
+ mutate(disttoAmbattur = distHaversine(c(lon, lat), c(Ambatturlon, Ambatturlat)))
Source: local data frame [10 x 11]
Groups: <by row>
# A tibble: 10 x 11
Locality Transactions lon lat Ambatturlon Ambatturlat Guindylon Guindylat OMRlon OMRlat disttoAmbattur
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ADYAR 607 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9 16744.
2 AMBATTUR 569 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 0.483
3 KOLATHUR 498 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 6309.
4 AVADI 409 80.1 13.1 80.2 13.1 80.2 13.0 80.2 12.9 6326.
5 AGARAM 103 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 8098.
6 ANNA NAGAR WEST 257 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 5984.
7 CHROMPET 303 80.1 13.0 80.2 13.1 80.2 13.0 80.2 12.9 18139.
8 MADIPAKKAM 343 80.2 13.0 80.2 13.1 80.2 13.0 80.2 12.9 17245.
9 MOGAPPAIR 316 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 4050.
10 MYLAPORE 205 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9 14975.
>
I could do same manually but there are many such localities columns.我可以手动做同样的事情,但有很多这样的地方列。 Could someone let me know if I can loop through other localities and add a new column similar to disttoAmbattur for each lat long combination of all localities columns.
有人可以让我知道我是否可以遍历其他位置并为所有位置列的每个经纬度组合添加一个类似于 disttoAmbattur 的新列。
We can gather all the lat and lon columns together in a vector and use map2
to pass them in parrallel.我们可以将所有 lat 和 lon 列聚集在一个向量中,并使用
map2
以map2
方式传递它们。 Calculate distHaversine
for each pair and add them as new columns in the original dataframe.计算每对的
distHaversine
并将它们添加为原始数据distHaversine
的新列。
library(dplyr)
library(purrr)
lon_col <- grep('.lon', names(df), value = TRUE)
lat_col <- grep('.lat', names(df), value = TRUE)
df %>%
bind_cols(map2_dfc(lon_col, lat_col, ~{
newcol <- paste0('dist', sub('lon', '', .x))
df %>%
rowwise() %>%
transmute(!!newcol := geosphere::distHaversine(c(lon, lat),
c(.data[[.x]], .data[[.y]])))
}))
# A tibble: 10 x 13
# Locality Transactions lon lat Ambatturlon Ambatturlat Guindylon Guindylat OMRlon OMRlat distAmbattur distGuindy distOMR
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 ADYAR 607 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9 16744. 3937. 10296.
# 2 AMBATTUR 569 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 0.483 13953. 23861.
# 3 KOLATHUR 498 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 6309. 13090. 23599.
# 4 AVADI 409 80.1 13.1 80.2 13.1 80.2 13.0 80.2 12.9 6326. 17437. 25935.
# 5 AGARAM 103 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 8098. 11723. 22154.
# 6 ANNA NAGAR WEST 257 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 5984. 9085. 19548.
# 7 CHROMPET 303 80.1 13.0 80.2 13.1 80.2 13.0 80.2 12.9 18139. 10140. 9995.
# 8 MADIPAKKAM 343 80.2 13.0 80.2 13.1 80.2 13.0 80.2 12.9 17245. 5373. 6823.
# 9 MOGAPPAIR 316 80.2 13.1 80.2 13.1 80.2 13.0 80.2 12.9 4050. 9906. 19934.
#10 MYLAPORE 205 80.3 13.0 80.2 13.1 80.2 13.0 80.2 12.9 14975. 6101. 14440.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.