Problem: I have a dataframe (see example data) which contains the distances between spatial points ('siteA' & 'siteB') and whether they are too close to each other or not ('close'). I need a way to combine the sites that are close to each other into one vector. In the example data: site 1 is close to site 3 but far from site 2. However, site 3 is close to site 2. Therefore, I need a way to combine these into one vector (for each group) in a list, and have an output where sites 1,2,3 are in one vector; sites 4 and 5 in one vector. Then all vectors combined in a list.
# ----------------------------- #
# --- Example table of data --- #
# ----------------------------- #
siteA siteB distance close
1 1 2 2913.35364 FALSE
2 1 3 1894.23651 TRUE
3 1 4 96487.01697 FALSE
4 1 5 96485.33550 FALSE
5 2 3 1642.27932 TRUE
6 2 4 93185.78766 FALSE
7 2 5 93183.73986 FALSE
8 3 4 102445.53187 FALSE
9 3 5 102448.58978 FALSE
10 4 5 3.47365 TRUE
# ----------------------------- #
# Example console output for expected results:
> expected_results
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5
This table already contains all the combinations between pairs of sites, but I need the combinations of all overlapping pairs (if close = TRUE) as one vector for each group (such as in expected_results above).
In the example data there are only 5 sites, but these can vary from 2 to 20+, and also in the example the distance is taken at 2500 and anything below that is considered close, however, this value can also vary depending on user input.
# Example dataset
df <- data.frame(
siteA = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4),
siteB = c(2, 3, 4, 5, 3, 4, 5, 4, 5, 5),
distance = c(2913.35364, 1894.23651, 96487.01697, 96485.33550, 1642.27932, 93185.78766, 93183.73986, 102445.53187, 102448.58978, 3.47365),
close = c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)
)
I am struggling to find a solution and any guidance would be greatly appreciated. My apologies for not providing example code, I've tried multiple looping approaches and the all ended dismally.
Thanks!
It can probably be done in better fashion with few improvments.
CODE
library(tidyverse)
df <- data.frame(
siteA = c(1,1,1,1,2,2,2,3,3,4),
siteB = c(2,3,4,5,3,4,5,4,5,5),
close = c(F,T,F,F,T,F,F,F,F,T)
)
unvisited_sites <- df %>%
select(contains("site")) %>%
unlist() %>%
unique()
site_groups <- list()
i <- 1
while(length(unvisited_sites) > 0){
visited_sites <- NULL
S <- unvisited_sites[[1]]
while(length(S) > 0){
u <- S[[1]]
sites <- df %>%
filter(siteA == u | siteB == u) %>%
filter(close == TRUE) %>%
select(siteA, siteB) %>%
unlist() %>%
unique() %>%
intersect(unvisited_sites)
visited_sites <- union(visited_sites, sites)
unvisited_sites <- setdiff(unvisited_sites, u)
S <- union(S, intersect(sites, unvisited_sites)) %>% setdiff(u)
}
site_groups[[i]] <- visited_sites %>% sort()
i <- i + 1
}
OUTPUT
site_groups
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5
I'm not entirely sure this will scale to more complex webs, but it works with the above data.
aggregate(siteA ~ siteB, df[df$close == T,], paste)
siteB siteA
1 3 1, 2
2 5 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.