简体   繁体   English

R - 根据阈值为所有组合创建向量列表

[英]R - Create list of vectors for all combinations combined based on threshold value

Problem: I have a dataframe (see example data) which contains the distances between spatial points ('siteA' & 'siteB') and whether they are too close to each other or not ('close').问题:我有一个 dataframe(参见示例数据),其中包含空间点(“siteA”和“siteB”)之间的距离以及它们是否彼此太近(“close”)。 I need a way to combine the sites that are close to each other into one vector.我需要一种将彼此靠近的站点组合成一个向量的方法。 In the example data: site 1 is close to site 3 but far from site 2. However, site 3 is close to site 2. Therefore, I need a way to combine these into one vector (for each group) in a list, and have an output where sites 1,2,3 are in one vector;在示例数据中:站点 1 靠近站点 3,但远离站点 2。但是,站点 3 靠近站点 2。因此,我需要一种方法将它们组合成列表中的一个向量(对于每个组),并且有一个 output,其中站点 1、2、3 在一个向量中; sites 4 and 5 in one vector.位点 4 和 5 在一个向量中。 Then all vectors combined in a list.然后将所有向量组合在一个列表中。

# ----------------------------- #
# --- Example table of data --- #
# ----------------------------- #
   siteA siteB     distance close
1      1     2   2913.35364 FALSE
2      1     3   1894.23651  TRUE
3      1     4  96487.01697 FALSE
4      1     5  96485.33550 FALSE
5      2     3   1642.27932  TRUE
6      2     4  93185.78766 FALSE
7      2     5  93183.73986 FALSE
8      3     4 102445.53187 FALSE
9      3     5 102448.58978 FALSE
10     4     5      3.47365  TRUE
# ----------------------------- #


# Example console output for expected results:
> expected_results
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

This table already contains all the combinations between pairs of sites, but I need the combinations of all overlapping pairs (if close = TRUE) as one vector for each group (such as in expected_results above).该表已经包含站点对之间的所有组合,但我需要所有重叠对的组合(如果 close = TRUE)作为每个组的一个向量(例如上面的预期结果)。

In the example data there are only 5 sites, but these can vary from 2 to 20+, and also in the example the distance is taken at 2500 and anything below that is considered close, however, this value can also vary depending on user input.在示例数据中只有 5 个站点,但这些站点可以从 2 到 20+ 不等,并且在示例中,距离取为 2500,低于该距离的任何位置都被认为是接近的,但是,此值也可能因用户输入而异.

# Example dataset
df <- data.frame(
  siteA = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4),
  siteB = c(2, 3, 4, 5, 3, 4, 5, 4, 5, 5),
  distance = c(2913.35364, 1894.23651, 96487.01697, 96485.33550, 1642.27932,  93185.78766, 93183.73986, 102445.53187, 102448.58978, 3.47365),
  close = c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)
)

I am struggling to find a solution and any guidance would be greatly appreciated.我正在努力寻找解决方案,任何指导将不胜感激。 My apologies for not providing example code, I've tried multiple looping approaches and the all ended dismally.对于没有提供示例代码,我深表歉意,我尝试了多种循环方法,但都以惨淡的方式结束。

Thanks!谢谢!

It can probably be done in better fashion with few improvments.它可能可以通过很少的改进以更好的方式完成。

CODE代码

library(tidyverse)

df <- data.frame(
  siteA = c(1,1,1,1,2,2,2,3,3,4),
  siteB = c(2,3,4,5,3,4,5,4,5,5),
  close = c(F,T,F,F,T,F,F,F,F,T)
)

unvisited_sites <- df %>%
  select(contains("site")) %>%
  unlist() %>%
  unique()

site_groups <- list()
i <- 1
while(length(unvisited_sites) > 0){
  
  visited_sites <- NULL
  S <- unvisited_sites[[1]]
  while(length(S) > 0){
    
    u <- S[[1]]
    
    sites <- df %>%
      filter(siteA == u | siteB == u) %>%
      filter(close == TRUE) %>%
      select(siteA, siteB) %>%
      unlist() %>%
      unique() %>%
      intersect(unvisited_sites)
    
    visited_sites <- union(visited_sites, sites)
    unvisited_sites <- setdiff(unvisited_sites, u)
    S <- union(S, intersect(sites, unvisited_sites)) %>% setdiff(u)
  }
  
  site_groups[[i]] <- visited_sites %>% sort()
  i <- i + 1
}

OUTPUT OUTPUT

site_groups
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

I'm not entirely sure this will scale to more complex webs, but it works with the above data.我不完全确定这会扩展到更复杂的网络,但它适用于上述数据。

aggregate(siteA ~ siteB, df[df$close == T,], paste)

  siteB siteA
1     3  1, 2
2     5     4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM