如何使用 dplyr 根据 R 中的另一个数据框匹配列的值？

Question

I have two datasets that look like this: The first is:我有两个看起来像这样的数据集：第一个是：

id ID	gear齿轮
A1 A1	A一种
A2 A2	B乙
A3 A3	C C
A4 A4	D丁
A5 A5	E乙
A6 A6	F F
A7 A7	G G
A8 A8	H H
A9 A9	I我
A10 A10	G G

And the second:第二个：

id ID	gear2齿轮2
A1 A1	A一种
A4 A4	E乙
A2 A2	A一种
A5 A5	E乙
A13 A13	B乙
A3 A3	C C
A9 A9	I我
A8 A8	B乙
A7 A7	G G
A20 A20	G G
A21 A21	B乙
A23 A23	D丁
A33 A33	E乙

There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id.有两个不平衡的数据帧。第一个数据帧是记录的数据集。第二个数据帧包含关于来自 id 的齿轮的已知信息。我想检查第一个数据帧是否实际记录的是已知的还是未知的。特别是我想检查给定的 id 代码以检查两个数据框中的齿轮是否相同。但每个 id 单独。 Ideally the result must be:理想情况下，结果必须是：

id ID	gear齿轮	CHECK查看
A1 A1	A一种	TRUE真的
A2 A2	B乙	FALSE错误的
A3 A3	C C	TRUE真的
A4 A4	D丁	FALSE错误的
A5 A5	E乙	TRUE真的
A6 A6	F F	N/A不适用
A7 A7	G G	TRUE真的
A8 A8	H H	FALSE错误的
A9 A9	I我	TRUE真的
A10 A10	G G	N/A不适用

id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1

id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2

How can i do it in R using the dplyr package?我如何使用 dplyr package 在 R 中做到这一点？ Any help?有什么帮助吗？

Answer 1

You can use a left_join and then compare the two columns:您可以使用left_join然后比较两列：

library(dplyr)

dat1 %>% 
  left_join(dat2, by = c("id1" = "id2")) %>% 
  mutate(CHECK = Gear1 == Gear2) %>% 
  select(id = id1, gear = Gear1, CHECK)

# A tibble: 10 × 3
   id    gear  CHECK
   <chr> <chr> <lgl>
 1 A1    A     TRUE 
 2 A2    B     FALSE
 3 A3    C     TRUE 
 4 A4    D     FALSE
 5 A5    E     TRUE 
 6 A6    F     NA   
 7 A7    G     TRUE 
 8 A8    H     FALSE
 9 A9    I     TRUE 
10 A10   G     NA

Have a look at the dplyr documentation how to use joins.查看dplyr文档如何使用连接。

Edit编辑

Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select function.感谢Ritchie Sacramento提供了直接在select function 中进行重命名的技巧。

Answer 2

Try this:尝试这个：

dat1 |> 
  mutate(check = ifelse(!id1 %in% dat2$id2, 
                        NA,
                        ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2), 
                               TRUE, 
                               FALSE)))

Answer 3

library(tidyverse)

dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')

check_data = dat1 %>% 
  full_join(dat2, by='id') %>% 
  mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>% 
  filter(! is.na(Gear1))

Output: Output：

check_data
# A tibble: 10 x 4
   id    Gear1 Gear2 check
   <chr> <chr> <chr> <lgl>
 1 A1    A     A     TRUE 
 2 A2    B     A     FALSE
 3 A3    C     C     TRUE 
 4 A4    D     E     FALSE
 5 A5    E     E     TRUE 
 6 A6    F     NA    NA   
 7 A7    G     G     TRUE 
 8 A8    H     B     FALSE
 9 A9    I     I     TRUE 
10 A10   G     NA    NA

如何使用 dplyr 根据 R 中的另一个数据框匹配列的值？

问题描述

3 个解决方案

解决方案1
3 已采纳 2022-06-02 11:17:14

Edit编辑

解决方案2
0 2022-06-02 11:17:30

解决方案3
0 2022-06-02 11:20:08

如何使用 dplyr 根据 R 中的另一个数据框匹配列的值？

问题描述

3 个解决方案

解决方案1 3 已采纳 2022-06-02 11:17:14

Edit编辑

解决方案2 0 2022-06-02 11:17:30

解决方案3 0 2022-06-02 11:20:08

解决方案1
3 已采纳 2022-06-02 11:17:14

解决方案2
0 2022-06-02 11:17:30

解决方案3
0 2022-06-02 11:20:08