简体   繁体   English

使用distm(distVincentyEllipsoid)将点子集(相同ID)之间的平均大地测量距离并将结果存储在R中的新数据框中

[英]Average geodetic distance between subsets of points (same ID) using distm(distVincentyEllipsoid) and storing the results in a new dataframe in R

My database has the following structure: 我的数据库具有以下结构:

    > long <- c(13.2345, 14.2478, 16.2001, 11.2489, 17.4784, 27.6478, 14.2500, 12.2100, 11.2014, 12.2147)
    > lat <- c(47.1247, 48.2013, 41.2547, 41.2147, 40.3247, 46.4147, 42.4786, 41.2478, 48.2147, 47.2157)
    > hh_id <- 1:10
    > vill_id <- c(rep(100, 4), rep(101, 3), rep(102, 2), 103)

    > df <- matrix(c(long, lat, hh_id, vill_id), nrow = 10, ncol = 4)
    > colnames(df) <- c("longitude", "latitude", "hh_id", "vill_id") 
    > df <- as.data.frame(df)
    > df
       longitude latitude hh_id vill_id
       13.2345  47.1247     1     100
       14.2478  48.2013     2     100
       16.2001  41.2547     3     100
       11.2489  41.2147     4     100
       17.4784  40.3247     5     101
       27.6478  46.4147     6     101
       14.2500  42.4786     7     101
       12.2100  41.2478     8     102
       11.2014  48.2147     9     102
       12.2147  47.2157    10     103

hh_id - households IDs hh_id-家庭ID

vill_id - village IDs vill_id-村庄ID

Households with identical ID belong to the same village. 具有相同ID的家庭属于同一村庄。

My aim: calculate the mean distance between all points with the same vill_id and store the result in a new data frame: 我的目标是:计算具有相同vill_id的所有点之间的平均距离,并将结果存储在新的数据框中:

vill_id    mean_dist
100        587553.5
101        …………………
102        …………………
103        ………………

My approach: To calculate the geodetic distance between points I have used the distm command from the geosphere package (distVincentyEllipsoid should be most accurate) 我的方法:要计算点之间的测地距离,我已使用geosphere包中的distm命令(distVincentyEllipsoid应该是最准确的)

> library(geosphere)
> df_100 <- df[df$vill_id == 100, ]
> dist_100 <- distm(df_100, fun = distVincentyEllipsoid)
Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2 --> 
> df_100_2 <- df_100[, c(1, 2)]
> dist_100_2 <- distm(df_100_2, fun = distVincentyEllipsoid)
> dist_100_2
         [,1]     [,2]     [,3]     [,4]
[1,]      0.0 141844.7 693867.8 675556.9
[2,] 141844.7      0.0 787217.4 811777.4
[3,] 693867.8 787217.4      0.0 415056.6
[4,] 675556.9 811777.4 415056.6      0.0

So a symmetric distance matrix for all points with vill_id = 100 was generated. 因此,生成了vill_id = 100的所有点的对称距离矩阵。 To calculate the mean distance I need to to decompose this matrix (or drop all of the diagonal values (0)). 要计算平均距离,我需要分解该矩阵(或除去所有对角线值(0))。

> diag(dist_100_2) = NA
> dist_100_2_final <- dist_100_2[!is.na(dist_100_2)]
> dist_100_2_final
 [1] 141844.7 693867.8 675556.9 141844.7 787217.4 811777.4 693867.8 787217.4 415056.6 675556.9
[11] 811777.4 415056.6
> mean(dist_100_2_final)
[1] 587553.5 (in m)

So far so good. 到现在为止还挺好。 Now I need to create a new dataframe which stores the mean distances for all subsets with the same ID (my original database has over 200 villages (vill_id) and almost 2000 households (hh_id)). 现在,我需要创建一个新的数据框,以存储具有相同ID的所有子集的平均距离(我的原始数据库有200多个村庄(vill_id)和近2000户(hh_id))。 Can you please help me how to finish the code? 您能帮我完成代码吗? I think I have to use loops (or maybe there is another package to solve this problem)? 我想我必须使用循环(或者也许有另一个软件包来解决这个问题)? Many thanks for your help. 非常感谢您的帮助。

Yesterday I have posted similar question with the difference that the mean_dist were already part of my original dataframe (computed in ArcGIS) but now I want to calculate these in R to compare the results. 昨天我发布了类似的问题,不同之处在于mean_dist已经是我的原始数据框的一部分(在ArcGIS中计算),但现在我想在R中计算这些值以比较结果。 I have tried to implement the recommended codes from my previous question but without success. 我尝试实施上一个问题中推荐的代码,但没有成功。

Consider base R's by since you need to run an operation across different levels of factors (ie, vill_id ). 考虑基数R by因为您需要在不同级别的因子(即vill_id )上运行操作。 Inside by , you can call a defined or anonymous function which will return a list of dataframes that you can row bind back to one dataframe: by ,您可以调用定义的或匿名函数,该函数将返回一个数据框列表,您可以将其行绑定回一个数据框:

dfList <- by(df, df[c("vill_id")], FUN = function(i){
     sub <- i[, c(1, 2)]
     tmp <- distm(sub, fun = distVincentyEllipsoid)
     diag(tmp) = NA
     i$mean_dist <- mean(tmp[!is.na(tmp)])                  # NEW COLUMN ADDED
     return(i)
})

finaldf <- do.call(rbind, dfList)

Should you need vill_id and hh_id subset, add to the factor list: 如果您需要vill_idhh_id子集,请添加到因子列表中:

dfList <- by(df, df[c("vill_id", "hh_id")], FUN = function(i){ ... })

And if you only need vill_id and mean_dist returned from function, change return value: 如果只需要从函数返回vill_idmean_dist ,则更改返回值:

newdf <- unique(i[c("vill_id", "mean_dist")]
return(newdf)

Specifically, the following block of code: 具体来说,下面的代码块:

df_100 <- df[df$vill_id == 100, ]                            # BY REPLACES THIS LINE
df_100_2 <- df_100[, c(1, 2)]
dist_100_2 <- distm(df_100_2, fun = distVincentyEllipsoid)                 
diag(dist_100_2) = NA
dist_100_2_final <- dist_100_2[!is.na(dist_100_2)]
mean(dist_100_2_final)

Is translated as the following where i is the by function variable: 转换为以下内容,其中iby函数变量:

sub <- i[,c(1, 2)]
tmp <- distm(sub, fun = distVincentyEllipsoid)
diag(tmp) = NA
i$mean_dist <- mean(tmp[!is.na(tmp)])

Another way would be to use lapply() . 另一种方法是使用lapply() I basically revised your code. 我基本上修改了您的代码。 One thing I added was to split your data by vill_id and create a list. 我添加的一件事是按vill_id拆分数据并创建一个列表。 Then, I applied your chunk of code for calculating distance to each split data frame in lapply() . 然后,我将您的代码块应用于lapply()每个拆分数据帧的距离计算。 Finally, I created a data frame with mean values. 最后,我创建了一个带有平均值的数据框。

library(geosphere)

mylist <- split(df, f = df$vill_id)

unlist(lapply(mylist, function(x){

        foo <- x[, 1:2]
        foo <- distm(foo, fun = distVincentyEllipsoid)
        diag(foo) = NA
        out <- foo[!is.na(foo)]
        average <- mean(out)
        average
      })
) -> mean_dist

data.frame(vill_id = unique(df$vill_id),
           mean_dist = mean_dist)

#    vill_id mean_dist
#100     100  587553.5
#101     101  858785.6
#102     102  778299.1
#103     103       NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 distm function 或 R 中的 distVincentyEllipsoid 之间的区别 - Difference between distm function or the distVincentyEllipsoid in R 使用mutate中的distm函数计算两点之间的距离 - Calculating distance between two points using the distm function inside mutate ggmap / geosphere点子集之间的平均距离 - Average distance between subsets of points with ggmap /geosphere 根据数据帧中的相同ID计算不同子集的平均值,并将结果存储在R中的新数据帧中 - calculate the mean of different subsets based on the same ID in a dataframe and store the results in a new dataframe in R 与在excel中计算相同事​​物时,在R中使用distm函数计算两个坐标之间的距离可得出不同的答案 - Using distm function in R to calculate distance between two coordinates gives a different answer than when calculating the same thing in excel Distm function 用于计算 R 中坐标之间的距离 - Distm function for calculate distance between coordinates in R 使用R中的distm()计算数据帧中两个GPS位置之间的距离 - Calculating distance between two GPS locations in a data frame using distm () in R 使用gdist()计算分组子集的点之间的距离 - Calculate distance between points using gdist() for grouped subsets R语言:计算某个日期之间行的子集的平均值,并将结果保存到新的数据框中 - R Language: Calculating an average across a subset of rows between a certain date and saving the results to a new dataframe 创建一个新列,条件为R中的点之间的行进距离 - create a new column conditional on distance traveled between points in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM