简体   繁体   English

从R中的距离矩阵中找到每个索引的最短平均距离

[英]Finding shortest mean distances per index from a distance matrix in R

I'm helping to put together a spatial R lab for a third year class, and one of the tasks will be to identify a specific site that is located the closest (ie mean shortest distance) to a set of multiple other sites. 我正在帮助组一个三年级的空间R实验室,任务之一是确定一个特定的站点,该站点与一组多个其他站点最接近(即,平均距离最短)。

I have a distance matrix dist_m that I produced by using the gdistance::costDistance which looks something like this: 我使用gdistance::costDistance生成了一个距离矩阵dist_m ,它看起来像这样:

# Sample data
m <- matrix(c(2, 1, 8, 5,
              7, 6, 3, 4,
              9, 3, 2, 8,
              1, 3, 7, 4),
            nrow  = 4,
            ncol  = 4,
            byrow = TRUE)

# Sample distance matrix
dist_m <- dist(m)

dist_m when printed looks like: dist_m打印时如下所示:

          1         2         3
2  8.717798
3  9.899495  5.477226
4  2.645751  7.810250 10.246951

Desired output: From this dist I want to be able to identify the index value ( 1 , 2 , 3 or 4 ) that has the lowest average distance. 期望的输出:从这个DIST我希望能够识别索引值( 1234具有最低的平均距离)。 In this example, it would be index 4 , which has an average distance of 6.90 . 在此示例中,它将是索引4 ,其平均距离为6.90 Ideally, I'd also like the mean distance returned too ( 6.90 ). 理想情况下,我也希望返回平均距离( 6.90 )。

I can find the mean distance of an individual index by doing something like this: 我可以通过以下操作找到单个索引的平均距离:

# Convert distance matrix to matrix
m = as.matrix(dist_m)

# Set diagonals and upper triangle to NA
m[upper.tri(m)] = NA
m[m == 0] = NA

# Calculate mean for index
mean(c(m[4,], m[,4]), na.rm = TRUE)

However, I ideally want a solution that either identifies the index with the minimum mean distance directly, rather than having to plug in index values manually (the actual dataset will be much larger than this). 但是,理想情况下,我希望有一个解决方案可以直接识别具有最小平均距离的索引,而不必手动插入索引值(实际数据集将比这个大得多)。

As this is for a university class, I'd like to keep any solution as simple as possible: for-loops and apply functions are likely to be difficult to grasp for students with little experience in R. 因为这是针对大学班级的,所以我想使任何解决方案都尽可能简单:对于没有R经验的学生,for循环和Apply函数可能很难掌握。

try this: 尝试这个:

rMeans <- rowMeans(m, na.rm = T)
names(rMeans) <- NULL
which(rMeans == min(rMeans, na.rm = T))
# [1] 4

Or as a function: 或作为功能:

minMeanDist <- function(x) {
  m <- as.matrix(x)
  m[upper.tri(m)] <- NA
  m[m == 0] <- NA
  rMeans <- rowMeans(m, na.rm = T)
  names(rMeans) <- NULL
  mmd <- min(rMeans, na.rm = T)
  ind <- which(rMeans == mmd)
  list(index = ind, min_mean_dist = mmd)
}
minMeanDist(dist_m)
# $index
# [1] 4
# 
# $min_mean_dist
# [1] 6.900984

If you want to use the tidyverse this is one way: 如果要使用tidyverse这是一种方法:

as.matrix(dist_m) %>%
    as.tibble() %>%
    rownames_to_column(var = "start_node") %>%
    gather(end_node, dist, -start_node) %>% # go long
    filter(dist != 0) %>% # drop identity diagonal
    group_by(start_node) %>% # now summarise
    summarise(mean_dist = mean(dist)) %>%
    filter(mean_dist == min(mean_dist)) # chose minimum mean_dist

# A tibble: 1 x 2
  start_node mean_dist
       <chr>     <dbl>
1          4  6.900984

It's a little long but the pipes make it easy to see what is happening at each line and you get a nice output. 它有点长,但是通过管道可以很容易地看到每一行发生了什么,并且您得到了不错的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM