简体   繁体   中英

R: find index of (best/incompletely) matching elements in two lists

I have two lists of character vectors called three_letters and four_letters defined as:

three_letters <- replicate(sample(letters, size = 3), n = 100, simplify = FALSE)

four_letters <- sample(three_letters, replace = FALSE, size = 100) %>%
  map(.f = ~ c(., sample(LETTERS, 1)))

where each element in the three_letters list has a corresponding element in the four_letters list sharing all but one "subelement" letter.

I would like to produce a 1D vector of the INDEX of the element in list four_letters that matches (3 out of 4, or generalized n out of m if possible) each element in list three_letters .

I'm likely overthinking this but here's the tedious and very non-generalizable solution I've come up with:

# first define helper function:
count_unique_list <- function(l1_element, l2_element){
  length(unique(unlist(append(l1_element,l2_element))))
}

# use nested map() functions

four_letter_indices <-
# for every element in three_letters:
  map(three_letters, .f = function(x){
    # for every element in four_letters:
    map(four_letters, .f = function(y){
      # is the length of unique union equal to 4?
      count_unique_list(x,y) == 4
    }) %>%
      # return index of TRUE
      detect_index(.f = isTRUE)
  }) %>%
  unlist()

# to check success visually I used cbind on arrayified lists:
cbind(matrix(unlist(three_letters), ncol = 3, byrow = TRUE),
      matrix(unlist(four_letters[four_letter_indices]), ncol = 4, byrow = TRUE))

If possible, I would especially like a Hadley-Wickham-styled "tidy" solution to this as those make the most sense to me and tend to be more deployable in my current data analysis pipelines.

Cheers

Here's an approach:

library(tidyverse)
three_letters %>%
  map(~{a = .x;which(map_lgl(four_letters,~all(a %in% .x)))})

We need to reassign the outer .x to a new variable because inside the nested map .x will be reassigned to the second level.

{...} just allows you to evaluate multiple expressions and only return the last. The expressions are separated by ;or a new line.

In tidyevaluation, ~ denotes the lambda function expression

function(...)

Or more precisely, a formula created with ~ is converted to a function. The first argument of ... is asigned to . , .x and ..1 . See help(purrr::map) for more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM