R：在兩個列表中查找（最佳/不完全）匹配元素的索引

Question

我有兩個稱為three_letters和four_letters的字符向量列表，定義為：

three_letters <- replicate(sample(letters, size = 3), n = 100, simplify = FALSE)

four_letters <- sample(three_letters, replace = FALSE, size = 100) %>%
  map(.f = ~ c(., sample(LETTERS, 1)))

其中three_letters 列表中的每個元素在four_letters列表中都有一個對應的元素，共享除一個“子元素”之外的所有字母。

我想生成列表four_letters 中元素的INDEX 的一維向量，該向量與列表three_letters中的每個元素匹配（4 個中的3 個，或m 中的廣義n ）。

我可能想多了，但這是我想出的乏味且非常不可概括的解決方案：

# first define helper function:
count_unique_list <- function(l1_element, l2_element){
  length(unique(unlist(append(l1_element,l2_element))))
}

# use nested map() functions

four_letter_indices <-
# for every element in three_letters:
  map(three_letters, .f = function(x){
    # for every element in four_letters:
    map(four_letters, .f = function(y){
      # is the length of unique union equal to 4?
      count_unique_list(x,y) == 4
    }) %>%
      # return index of TRUE
      detect_index(.f = isTRUE)
  }) %>%
  unlist()

# to check success visually I used cbind on arrayified lists:
cbind(matrix(unlist(three_letters), ncol = 3, byrow = TRUE),
      matrix(unlist(four_letters[four_letter_indices]), ncol = 4, byrow = TRUE))

如果可能的話，我特別喜歡 Hadley-Wickham 風格的“整潔”解決方案，因為這些解決方案對我來說最有意義，並且在我當前的數據分析管道中更易於部署。

干杯

Answer 1

這是一種方法：

library(tidyverse)
three_letters %>%
  map(~{a = .x;which(map_lgl(four_letters,~all(a %in% .x)))})

我們需要將外部.x重新分配給一個新變量，因為在嵌套的map .x內部將重新分配給第二級。

{...}只允許您評估多個表達式並且只返回最后一個。 表達式由;分隔或新行。

在 tidyevaluation 中， ~表示 lambda function 表達式

function(...)

或者更准確地說，使用~創建的公式將轉換為 function。 ...的第一個參數分配給. , .x和..1 。 有關更多信息，請參閱help(purrr::map) 。

R：在兩個列表中查找（最佳/不完全）匹配元素的索引

問題描述

1 個解決方案

解決方案1
0 已采納 2020-11-27 18:16:41

R：在兩個列表中查找（最佳/不完全）匹配元素的索引

問題描述

1 個解決方案

解決方案1 0 已采納 2020-11-27 18:16:41

解決方案1
0 已采納 2020-11-27 18:16:41