使用distm（distVincentyEllipsoid）将点子集（相同ID）之间的平均大地测量距离并将结果存储在R中的新数据框中

Question

My database has the following structure: 我的数据库具有以下结构：

    > long <- c(13.2345, 14.2478, 16.2001, 11.2489, 17.4784, 27.6478, 14.2500, 12.2100, 11.2014, 12.2147)
    > lat <- c(47.1247, 48.2013, 41.2547, 41.2147, 40.3247, 46.4147, 42.4786, 41.2478, 48.2147, 47.2157)
    > hh_id <- 1:10
    > vill_id <- c(rep(100, 4), rep(101, 3), rep(102, 2), 103)

    > df <- matrix(c(long, lat, hh_id, vill_id), nrow = 10, ncol = 4)
    > colnames(df) <- c("longitude", "latitude", "hh_id", "vill_id") 
    > df <- as.data.frame(df)
    > df
       longitude latitude hh_id vill_id
       13.2345  47.1247     1     100
       14.2478  48.2013     2     100
       16.2001  41.2547     3     100
       11.2489  41.2147     4     100
       17.4784  40.3247     5     101
       27.6478  46.4147     6     101
       14.2500  42.4786     7     101
       12.2100  41.2478     8     102
       11.2014  48.2147     9     102
       12.2147  47.2157    10     103

hh_id - households IDs hh_id-家庭ID

vill_id - village IDs vill_id-村庄ID

Households with identical ID belong to the same village. 具有相同ID的家庭属于同一村庄。

My aim: calculate the mean distance between all points with the same vill_id and store the result in a new data frame: 我的目标是：计算具有相同vill_id的所有点之间的平均距离，并将结果存储在新的数据框中：

vill_id    mean_dist
100        587553.5
101        …………………
102        …………………
103        ………………

My approach: To calculate the geodetic distance between points I have used the distm command from the geosphere package (distVincentyEllipsoid should be most accurate) 我的方法：要计算点之间的测地距离，我已使用geosphere包中的distm命令（distVincentyEllipsoid应该是最准确的）

> library(geosphere)
> df_100 <- df[df$vill_id == 100, ]
> dist_100 <- distm(df_100, fun = distVincentyEllipsoid)
Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2 --> 
> df_100_2 <- df_100[, c(1, 2)]
> dist_100_2 <- distm(df_100_2, fun = distVincentyEllipsoid)
> dist_100_2
         [,1]     [,2]     [,3]     [,4]
[1,]      0.0 141844.7 693867.8 675556.9
[2,] 141844.7      0.0 787217.4 811777.4
[3,] 693867.8 787217.4      0.0 415056.6
[4,] 675556.9 811777.4 415056.6      0.0

So a symmetric distance matrix for all points with vill_id = 100 was generated. 因此，生成了vill_id = 100的所有点的对称距离矩阵。 To calculate the mean distance I need to to decompose this matrix (or drop all of the diagonal values (0)). 要计算平均距离，我需要分解该矩阵（或除去所有对角线值（0））。

> diag(dist_100_2) = NA
> dist_100_2_final <- dist_100_2[!is.na(dist_100_2)]
> dist_100_2_final
 [1] 141844.7 693867.8 675556.9 141844.7 787217.4 811777.4 693867.8 787217.4 415056.6 675556.9
[11] 811777.4 415056.6
> mean(dist_100_2_final)
[1] 587553.5 (in m)

So far so good. 到现在为止还挺好。 Now I need to create a new dataframe which stores the mean distances for all subsets with the same ID (my original database has over 200 villages (vill_id) and almost 2000 households (hh_id)). 现在，我需要创建一个新的数据框，以存储具有相同ID的所有子集的平均距离（我的原始数据库有200多个村庄（vill_id）和近2000户（hh_id））。 Can you please help me how to finish the code? 您能帮我完成代码吗？ I think I have to use loops (or maybe there is another package to solve this problem)? 我想我必须使用循环（或者也许有另一个软件包来解决这个问题）？ Many thanks for your help. 非常感谢您的帮助。

Yesterday I have posted similar question with the difference that the mean_dist were already part of my original dataframe (computed in ArcGIS) but now I want to calculate these in R to compare the results. 昨天我发布了类似的问题，不同之处在于mean_dist已经是我的原始数据框的一部分（在ArcGIS中计算），但现在我想在R中计算这些值以比较结果。 I have tried to implement the recommended codes from my previous question but without success. 我尝试实施上一个问题中推荐的代码，但没有成功。

Answer 1

Consider base R's by since you need to run an operation across different levels of factors (ie, vill_id ). 考虑基数R by因为您需要在不同级别的因子（即vill_id ）上运行操作。 Inside by , you can call a defined or anonymous function which will return a list of dataframes that you can row bind back to one dataframe: 在by ，您可以调用定义的或匿名函数，该函数将返回一个数据框列表，您可以将其行绑定回一个数据框：

dfList <- by(df, df[c("vill_id")], FUN = function(i){
     sub <- i[, c(1, 2)]
     tmp <- distm(sub, fun = distVincentyEllipsoid)
     diag(tmp) = NA
     i$mean_dist <- mean(tmp[!is.na(tmp)])                  # NEW COLUMN ADDED
     return(i)
})

finaldf <- do.call(rbind, dfList)

Should you need vill_id and hh_id subset, add to the factor list: 如果您需要vill_id和hh_id子集，请添加到因子列表中：

dfList <- by(df, df[c("vill_id", "hh_id")], FUN = function(i){ ... })

And if you only need vill_id and mean_dist returned from function, change return value: 如果只需要从函数返回vill_id和mean_dist ，则更改返回值：

newdf <- unique(i[c("vill_id", "mean_dist")]
return(newdf)

Specifically, the following block of code: 具体来说，下面的代码块：

df_100 <- df[df$vill_id == 100, ]                            # BY REPLACES THIS LINE
df_100_2 <- df_100[, c(1, 2)]
dist_100_2 <- distm(df_100_2, fun = distVincentyEllipsoid)                 
diag(dist_100_2) = NA
dist_100_2_final <- dist_100_2[!is.na(dist_100_2)]
mean(dist_100_2_final)

Is translated as the following where i is the by function variable: 转换为以下内容，其中i是by函数变量：

sub <- i[,c(1, 2)]
tmp <- distm(sub, fun = distVincentyEllipsoid)
diag(tmp) = NA
i$mean_dist <- mean(tmp[!is.na(tmp)])

Answer 2

Another way would be to use lapply() . 另一种方法是使用lapply() 。 I basically revised your code. 我基本上修改了您的代码。 One thing I added was to split your data by vill_id and create a list. 我添加的一件事是按vill_id拆分数据并创建一个列表。 Then, I applied your chunk of code for calculating distance to each split data frame in lapply() . 然后，我将您的代码块应用于lapply()每个拆分数据帧的距离计算。 Finally, I created a data frame with mean values. 最后，我创建了一个带有平均值的数据框。

library(geosphere)

mylist <- split(df, f = df$vill_id)

unlist(lapply(mylist, function(x){

        foo <- x[, 1:2]
        foo <- distm(foo, fun = distVincentyEllipsoid)
        diag(foo) = NA
        out <- foo[!is.na(foo)]
        average <- mean(out)
        average
      })
) -> mean_dist

data.frame(vill_id = unique(df$vill_id),
           mean_dist = mean_dist)

#    vill_id mean_dist
#100     100  587553.5
#101     101  858785.6
#102     102  778299.1
#103     103       NaN

使用distm（distVincentyEllipsoid）将点子集（相同ID）之间的平均大地测量距离并将结果存储在R中的新数据框中

问题描述

2 个解决方案

解决方案1
0 已采纳 2017-05-21 14:12:48

解决方案2
0 2017-05-21 14:30:16

使用distm（distVincentyEllipsoid）将点子集（相同ID）之间的平均大地测量距离并将结果存储在R中的新数据框中

问题描述

2 个解决方案

解决方案1 0 已采纳 2017-05-21 14:12:48

解决方案2 0 2017-05-21 14:30:16

解决方案1
0 已采纳 2017-05-21 14:12:48

解决方案2
0 2017-05-21 14:30:16