R - 根据阈值为所有组合创建向量列表

Question

Problem: I have a dataframe (see example data) which contains the distances between spatial points ('siteA' & 'siteB') and whether they are too close to each other or not ('close').问题：我有一个 dataframe（参见示例数据），其中包含空间点（“siteA”和“siteB”）之间的距离以及它们是否彼此太近（“close”）。 I need a way to combine the sites that are close to each other into one vector.我需要一种将彼此靠近的站点组合成一个向量的方法。 In the example data: site 1 is close to site 3 but far from site 2. However, site 3 is close to site 2. Therefore, I need a way to combine these into one vector (for each group) in a list, and have an output where sites 1,2,3 are in one vector;在示例数据中：站点 1 靠近站点 3，但远离站点 2。但是，站点 3 靠近站点 2。因此，我需要一种方法将它们组合成列表中的一个向量（对于每个组），并且有一个 output，其中站点 1、2、3 在一个向量中； sites 4 and 5 in one vector.位点 4 和 5 在一个向量中。 Then all vectors combined in a list.然后将所有向量组合在一个列表中。

# ----------------------------- #
# --- Example table of data --- #
# ----------------------------- #
   siteA siteB     distance close
1      1     2   2913.35364 FALSE
2      1     3   1894.23651  TRUE
3      1     4  96487.01697 FALSE
4      1     5  96485.33550 FALSE
5      2     3   1642.27932  TRUE
6      2     4  93185.78766 FALSE
7      2     5  93183.73986 FALSE
8      3     4 102445.53187 FALSE
9      3     5 102448.58978 FALSE
10     4     5      3.47365  TRUE
# ----------------------------- #


# Example console output for expected results:
> expected_results
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

This table already contains all the combinations between pairs of sites, but I need the combinations of all overlapping pairs (if close = TRUE) as one vector for each group (such as in expected_results above).该表已经包含站点对之间的所有组合，但我需要所有重叠对的组合（如果 close = TRUE）作为每个组的一个向量（例如上面的预期结果）。

In the example data there are only 5 sites, but these can vary from 2 to 20+, and also in the example the distance is taken at 2500 and anything below that is considered close, however, this value can also vary depending on user input.在示例数据中只有 5 个站点，但这些站点可以从 2 到 20+ 不等，并且在示例中，距离取为 2500，低于该距离的任何位置都被认为是接近的，但是，此值也可能因用户输入而异.

# Example dataset
df <- data.frame(
  siteA = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4),
  siteB = c(2, 3, 4, 5, 3, 4, 5, 4, 5, 5),
  distance = c(2913.35364, 1894.23651, 96487.01697, 96485.33550, 1642.27932,  93185.78766, 93183.73986, 102445.53187, 102448.58978, 3.47365),
  close = c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)
)

I am struggling to find a solution and any guidance would be greatly appreciated.我正在努力寻找解决方案，任何指导将不胜感激。 My apologies for not providing example code, I've tried multiple looping approaches and the all ended dismally.对于没有提供示例代码，我深表歉意，我尝试了多种循环方法，但都以惨淡的方式结束。

Thanks!谢谢！

Answer 1

It can probably be done in better fashion with few improvments.它可能可以通过很少的改进以更好的方式完成。

CODE代码

library(tidyverse)

df <- data.frame(
  siteA = c(1,1,1,1,2,2,2,3,3,4),
  siteB = c(2,3,4,5,3,4,5,4,5,5),
  close = c(F,T,F,F,T,F,F,F,F,T)
)

unvisited_sites <- df %>%
  select(contains("site")) %>%
  unlist() %>%
  unique()

site_groups <- list()
i <- 1
while(length(unvisited_sites) > 0){
  
  visited_sites <- NULL
  S <- unvisited_sites[[1]]
  while(length(S) > 0){
    
    u <- S[[1]]
    
    sites <- df %>%
      filter(siteA == u | siteB == u) %>%
      filter(close == TRUE) %>%
      select(siteA, siteB) %>%
      unlist() %>%
      unique() %>%
      intersect(unvisited_sites)
    
    visited_sites <- union(visited_sites, sites)
    unvisited_sites <- setdiff(unvisited_sites, u)
    S <- union(S, intersect(sites, unvisited_sites)) %>% setdiff(u)
  }
  
  site_groups[[i]] <- visited_sites %>% sort()
  i <- i + 1
}

OUTPUT OUTPUT

site_groups
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

Answer 2

I'm not entirely sure this will scale to more complex webs, but it works with the above data.我不完全确定这会扩展到更复杂的网络，但它适用于上述数据。

aggregate(siteA ~ siteB, df[df$close == T,], paste)

  siteB siteA
1     3  1, 2
2     5     4

R - 根据阈值为所有组合创建向量列表

问题描述

2 个解决方案

解决方案1
1 2020-06-26 13:11:56

解决方案2
0 2020-06-26 11:49:51

R - 根据阈值为所有组合创建向量列表

问题描述

2 个解决方案

解决方案1 1 2020-06-26 13:11:56

解决方案2 0 2020-06-26 11:49:51

解决方案1
1 2020-06-26 13:11:56

解决方案2
0 2020-06-26 11:49:51