简体   繁体   中英

R - Create list of vectors for all combinations combined based on threshold value

Problem: I have a dataframe (see example data) which contains the distances between spatial points ('siteA' & 'siteB') and whether they are too close to each other or not ('close'). I need a way to combine the sites that are close to each other into one vector. In the example data: site 1 is close to site 3 but far from site 2. However, site 3 is close to site 2. Therefore, I need a way to combine these into one vector (for each group) in a list, and have an output where sites 1,2,3 are in one vector; sites 4 and 5 in one vector. Then all vectors combined in a list.

# ----------------------------- #
# --- Example table of data --- #
# ----------------------------- #
   siteA siteB     distance close
1      1     2   2913.35364 FALSE
2      1     3   1894.23651  TRUE
3      1     4  96487.01697 FALSE
4      1     5  96485.33550 FALSE
5      2     3   1642.27932  TRUE
6      2     4  93185.78766 FALSE
7      2     5  93183.73986 FALSE
8      3     4 102445.53187 FALSE
9      3     5 102448.58978 FALSE
10     4     5      3.47365  TRUE
# ----------------------------- #


# Example console output for expected results:
> expected_results
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

This table already contains all the combinations between pairs of sites, but I need the combinations of all overlapping pairs (if close = TRUE) as one vector for each group (such as in expected_results above).

In the example data there are only 5 sites, but these can vary from 2 to 20+, and also in the example the distance is taken at 2500 and anything below that is considered close, however, this value can also vary depending on user input.

# Example dataset
df <- data.frame(
  siteA = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4),
  siteB = c(2, 3, 4, 5, 3, 4, 5, 4, 5, 5),
  distance = c(2913.35364, 1894.23651, 96487.01697, 96485.33550, 1642.27932,  93185.78766, 93183.73986, 102445.53187, 102448.58978, 3.47365),
  close = c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)
)

I am struggling to find a solution and any guidance would be greatly appreciated. My apologies for not providing example code, I've tried multiple looping approaches and the all ended dismally.

Thanks!

It can probably be done in better fashion with few improvments.

CODE

library(tidyverse)

df <- data.frame(
  siteA = c(1,1,1,1,2,2,2,3,3,4),
  siteB = c(2,3,4,5,3,4,5,4,5,5),
  close = c(F,T,F,F,T,F,F,F,F,T)
)

unvisited_sites <- df %>%
  select(contains("site")) %>%
  unlist() %>%
  unique()

site_groups <- list()
i <- 1
while(length(unvisited_sites) > 0){
  
  visited_sites <- NULL
  S <- unvisited_sites[[1]]
  while(length(S) > 0){
    
    u <- S[[1]]
    
    sites <- df %>%
      filter(siteA == u | siteB == u) %>%
      filter(close == TRUE) %>%
      select(siteA, siteB) %>%
      unlist() %>%
      unique() %>%
      intersect(unvisited_sites)
    
    visited_sites <- union(visited_sites, sites)
    unvisited_sites <- setdiff(unvisited_sites, u)
    S <- union(S, intersect(sites, unvisited_sites)) %>% setdiff(u)
  }
  
  site_groups[[i]] <- visited_sites %>% sort()
  i <- i + 1
}

OUTPUT

site_groups
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

I'm not entirely sure this will scale to more complex webs, but it works with the above data.

aggregate(siteA ~ siteB, df[df$close == T,], paste)

  siteB siteA
1     3  1, 2
2     5     4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM