[英]How can i match the values of a column according to another of a data frame in R using dplyr?
I have two datasets that look like this: The first is:我有两个看起来像这样的数据集:第一个是:
id ID | gear齿轮 |
---|---|
A1 A1 | A一种 |
A2 A2 | B乙 |
A3 A3 | C C |
A4 A4 | D丁 |
A5 A5 | E乙 |
A6 A6 | F F |
A7 A7 | G G |
A8 A8 | H H |
A9 A9 | I我 |
A10 A10 | G G |
And the second:第二个:
id ID | gear2齿轮2 |
---|---|
A1 A1 | A一种 |
A4 A4 | E乙 |
A2 A2 | A一种 |
A5 A5 | E乙 |
A13 A13 | B乙 |
A3 A3 | C C |
A9 A9 | I我 |
A8 A8 | B乙 |
A7 A7 | G G |
A20 A20 | G G |
A21 A21 | B乙 |
A23 A23 | D丁 |
A33 A33 | E乙 |
There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id.有两个不平衡的数据帧。第一个数据帧是记录的数据集。第二个数据帧包含关于来自 id 的齿轮的已知信息。我想检查第一个数据帧是否实际记录的是已知的还是未知的。特别是我想检查给定的 id 代码以检查两个数据框中的齿轮是否相同。但每个 id 单独。 Ideally the result must be:理想情况下,结果必须是:
id ID | gear齿轮 | CHECK查看 |
---|---|---|
A1 A1 | A一种 | TRUE真的 |
A2 A2 | B乙 | FALSE错误的 |
A3 A3 | C C | TRUE真的 |
A4 A4 | D丁 | FALSE错误的 |
A5 A5 | E乙 | TRUE真的 |
A6 A6 | F F | N/A不适用 |
A7 A7 | G G | TRUE真的 |
A8 A8 | H H | FALSE错误的 |
A9 A9 | I我 | TRUE真的 |
A10 A10 | G G | N/A不适用 |
id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1
id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2
How can i do it in R using the dplyr package?我如何使用 dplyr package 在 R 中做到这一点? Any help?有什么帮助吗?
You can use a left_join
and then compare the two columns:您可以使用left_join
然后比较两列:
library(dplyr)
dat1 %>%
left_join(dat2, by = c("id1" = "id2")) %>%
mutate(CHECK = Gear1 == Gear2) %>%
select(id = id1, gear = Gear1, CHECK)
# A tibble: 10 × 3
id gear CHECK
<chr> <chr> <lgl>
1 A1 A TRUE
2 A2 B FALSE
3 A3 C TRUE
4 A4 D FALSE
5 A5 E TRUE
6 A6 F NA
7 A7 G TRUE
8 A8 H FALSE
9 A9 I TRUE
10 A10 G NA
Have a look at the dplyr
documentation how to use joins.查看dplyr
文档如何使用连接。
Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select
function.感谢Ritchie Sacramento提供了直接在select
function 中进行重命名的技巧。
Try this:尝试这个:
dat1 |>
mutate(check = ifelse(!id1 %in% dat2$id2,
NA,
ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2),
TRUE,
FALSE)))
library(tidyverse)
dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')
check_data = dat1 %>%
full_join(dat2, by='id') %>%
mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>%
filter(! is.na(Gear1))
Output: Output:
check_data
# A tibble: 10 x 4
id Gear1 Gear2 check
<chr> <chr> <chr> <lgl>
1 A1 A A TRUE
2 A2 B A FALSE
3 A3 C C TRUE
4 A4 D E FALSE
5 A5 E E TRUE
6 A6 F NA NA
7 A7 G G TRUE
8 A8 H B FALSE
9 A9 I I TRUE
10 A10 G NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.