[英]R: Join two tables (tibbles) by *list* columns
Seems like there should be a simple answer for this but I haven't been able to find one: 似乎应该对此有一个简单的答案,但我一直找不到:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib1
# A tibble: 3 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [1]>
3 <dbl [1]> <dbl [1]>
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib2
# A tibble: 4 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [2]>
3 <dbl [1]> <dbl [1]>
4 <dbl [1]> <dbl [1]>
dplyr::inner_join(tib1, tib2)
Joining, by = c("x", "y")
通过= c(“ x”,“ y”)加入
Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) : Can't join on 'x' x 'x' because of incompatible types (list / list)
inner_join_impl(x,y,by $ x,by $ y,后缀$ x,后缀$ y)中的错误:由于类型(列表/列表)不兼容,无法在'x'x'x'上加入
So is there a way to perform a join based on list columns (before I start writing my own)? 那么有没有一种方法可以基于列表列执行联接(在我开始编写自己的列之前)?
Basically if the list of both key variables is identical, I want the row to be included in the final table, and if not - not. 基本上,如果两个关键变量的列表相同,则我希望该行包含在最终表中,否则,不包含在内。 In the above example there are two key variables
x
and y
and the result should be only the first row in the two tibble
s since it's the only identical one in both key variables: 在上面的示例中,有两个关键变量
x
和y
,结果应该只是两个tibble
的第一行,因为它是两个关键变量中唯一相同的变量:
tibble(x = list(1), y = list(4))
# A tibble: 1 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
We could use hashes from digest
: 我们可以使用
digest
哈希值:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib1 <- mutate_all(tib1, funs(hash = map_chr(., digest::digest)))
tib2 <- mutate_all(tib2, funs(hash = map_chr(., digest::digest)))
inner_join(tib1, tib2, c('x_hash', 'y_hash')) %>%
select(x.x, x.y)
# A tibble: 1 × 2 xx xy <list> <list> 1 <dbl [1]> <dbl [1]>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.