[英]R: find index of (best/incompletely) matching elements in two lists
I have two lists of character vectors called three_letters and four_letters defined as:我有两个称为three_letters和four_letters的字符向量列表,定义为:
three_letters <- replicate(sample(letters, size = 3), n = 100, simplify = FALSE)
four_letters <- sample(three_letters, replace = FALSE, size = 100) %>%
map(.f = ~ c(., sample(LETTERS, 1)))
where each element in the three_letters list has a corresponding element in the four_letters list sharing all but one "subelement" letter.其中three_letters 列表中的每个元素在four_letters列表中都有一个对应的元素,共享除一个“子元素”之外的所有字母。
I would like to produce a 1D vector of the INDEX of the element in list four_letters that matches (3 out of 4, or generalized n out of m if possible) each element in list three_letters .我想生成列表four_letters 中元素的INDEX 的一维向量,该向量与列表three_letters中的每个元素匹配(4 个中的3 个,或m 中的广义n ) 。
I'm likely overthinking this but here's the tedious and very non-generalizable solution I've come up with:我可能想多了,但这是我想出的乏味且非常不可概括的解决方案:
# first define helper function:
count_unique_list <- function(l1_element, l2_element){
length(unique(unlist(append(l1_element,l2_element))))
}
# use nested map() functions
four_letter_indices <-
# for every element in three_letters:
map(three_letters, .f = function(x){
# for every element in four_letters:
map(four_letters, .f = function(y){
# is the length of unique union equal to 4?
count_unique_list(x,y) == 4
}) %>%
# return index of TRUE
detect_index(.f = isTRUE)
}) %>%
unlist()
# to check success visually I used cbind on arrayified lists:
cbind(matrix(unlist(three_letters), ncol = 3, byrow = TRUE),
matrix(unlist(four_letters[four_letter_indices]), ncol = 4, byrow = TRUE))
If possible, I would especially like a Hadley-Wickham-styled "tidy" solution to this as those make the most sense to me and tend to be more deployable in my current data analysis pipelines.如果可能的话,我特别喜欢 Hadley-Wickham 风格的“整洁”解决方案,因为这些解决方案对我来说最有意义,并且在我当前的数据分析管道中更易于部署。
Cheers干杯
Here's an approach:这是一种方法:
library(tidyverse)
three_letters %>%
map(~{a = .x;which(map_lgl(four_letters,~all(a %in% .x)))})
We need to reassign the outer .x
to a new variable because inside the nested map
.x
will be reassigned to the second level.我们需要将外部
.x
重新分配给一个新变量,因为在嵌套的map
.x
内部将重新分配给第二级。
{...}
just allows you to evaluate multiple expressions and only return the last. {...}
只允许您评估多个表达式并且只返回最后一个。 The expressions are separated by ;
表达式由
;
分隔or a new line.或新行。
In tidyevaluation, ~
denotes the lambda function expression在 tidyevaluation 中,
~
表示 lambda function 表达式
function(...)
Or more precisely, a formula created with ~
is converted to a function.或者更准确地说,使用
~
创建的公式将转换为 function。 The first argument of ...
is asigned to .
...
的第一个参数分配给.
, .x
and ..1
. ,
.x
和..1
。 See help(purrr::map)
for more.有关更多信息,请参阅
help(purrr::map)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.