简体   繁体   中英

Is there a fast way to compare two list of vectors in R?

If there are two lists of the form

list_a <- list(c(1,2,3,4,5), c(6,7,8,9,10), c(11,12,13,14))
list_b <- list(c(1,9,3,7,5), c(2,7,13,9,10), c(1,12,5,14))

I want to compare the elements in list_a vs the elements in list list_b and extract the sum of matched elements. I mean, I want to compare as follows

sum(c(1,2,3,4,5) %in% c(1,9,3,7,5)) 
sum(c(1,2,3,4,5) %in% c(2,7,13,9,10))
sum(c(1,2,3,4,5) %in% c(1,12,5,14))

sum(c(6,7,8,9,10) %in% c(1,9,3,7,5)) 
sum(c(6,7,8,9,10) %in% c(2,7,13,9,10))
sum(c(6,7,8,9,10) %in% c(1,12,5,14))

sum(c(11,12,13,14) %in% c(1,9,3,7,5)) 
sum(c(11,12,13,14) %in% c(2,7,13,9,10))
sum(c(11,12,13,14) %in% c(1,12,5,14))

I tried the following code using the sapply() function and the output is as expected (Colums maps to elements in list_a and rows maps to elements in list_b ). However, when the length of the lists is large this code is too slow (Imagine list_a with 10000 elements and list_b with 10000 elements).

test <- sapply(list_a, function(x){
 
  out_sum <- sapply(list_b, function(y){
    
    matches <- sum(x %in% y)
    
    return(matches)
    
  })

return(out_sum)
  
})

Output

在此处输入图片说明

Does anyone have an idea?

You can try using the map function. It reduces the runtime by more than half.

library(purrr)
out_sum <- list_b %>% map(function (y) {
  list_a %>%  map(function(x) sum(x %in% y))
})
out_matrix <- matrix(unlist(out_sum), ncol = length(list_a), byrow = TRUE)

Another option is to use outer -

check <- function(x, y) sum(list_a[[x]] %in% list_b[[y]])
test  <- outer(seq_along(list_a), seq_along(list_b), Vectorize(check))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM