简体   繁体   English

如何使用 dplyr 根据 R 中的另一个数据框匹配列的值?

[英]How can i match the values of a column according to another of a data frame in R using dplyr?

I have two datasets that look like this: The first is:我有两个看起来像这样的数据集:第一个是:

id ID gear齿轮
A1 A1 A一种
A2 A2 B
A3 A3 C C
A4 A4 D
A5 A5 E
A6 A6 F F
A7 A7 G G
A8 A8 H H
A9 A9 I
A10 A10 G G

And the second:第二个:

id ID gear2齿轮2
A1 A1 A一种
A4 A4 E
A2 A2 A一种
A5 A5 E
A13 A13 B
A3 A3 C C
A9 A9 I
A8 A8 B
A7 A7 G G
A20 A20 G G
A21 A21 B
A23 A23 D
A33 A33 E

There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id.有两个不平衡的数据帧。第一个数据帧是记录的数据集。第二个数据帧包含关于来自 id 的齿轮的已知信息。我想检查第一个数据帧是否实际记录的是已知的还是未知的。特别是我想检查给定的 id 代码以检查两个数据框中的齿轮是否相同。但每个 id 单独。 Ideally the result must be:理想情况下,结果必须是:

id ID gear齿轮 CHECK查看
A1 A1 A一种 TRUE真的
A2 A2 B FALSE错误的
A3 A3 C C TRUE真的
A4 A4 D FALSE错误的
A5 A5 E TRUE真的
A6 A6 F F N/A不适用
A7 A7 G G TRUE真的
A8 A8 H H FALSE错误的
A9 A9 I TRUE真的
A10 A10 G G N/A不适用
id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1

id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2

How can i do it in R using the dplyr package?我如何使用 dplyr package 在 R 中做到这一点? Any help?有什么帮助吗?

You can use a left_join and then compare the two columns:您可以使用left_join然后比较两列:

library(dplyr)

dat1 %>% 
  left_join(dat2, by = c("id1" = "id2")) %>% 
  mutate(CHECK = Gear1 == Gear2) %>% 
  select(id = id1, gear = Gear1, CHECK)

# A tibble: 10 × 3
   id    gear  CHECK
   <chr> <chr> <lgl>
 1 A1    A     TRUE 
 2 A2    B     FALSE
 3 A3    C     TRUE 
 4 A4    D     FALSE
 5 A5    E     TRUE 
 6 A6    F     NA   
 7 A7    G     TRUE 
 8 A8    H     FALSE
 9 A9    I     TRUE 
10 A10   G     NA   

Have a look at the dplyr documentation how to use joins.查看dplyr文档如何使用连接。

Edit编辑

Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select function.感谢Ritchie Sacramento提供了直接在select function 中进行重命名的技巧。

Try this:尝试这个:

dat1 |> 
  mutate(check = ifelse(!id1 %in% dat2$id2, 
                        NA,
                        ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2), 
                               TRUE, 
                               FALSE)))
library(tidyverse)

dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')

check_data = dat1 %>% 
  full_join(dat2, by='id') %>% 
  mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>% 
  filter(! is.na(Gear1))

Output: Output:

check_data
# A tibble: 10 x 4
   id    Gear1 Gear2 check
   <chr> <chr> <chr> <lgl>
 1 A1    A     A     TRUE 
 2 A2    B     A     FALSE
 3 A3    C     C     TRUE 
 4 A4    D     E     FALSE
 5 A5    E     E     TRUE 
 6 A6    F     NA    NA   
 7 A7    G     G     TRUE 
 8 A8    H     B     FALSE
 9 A9    I     I     TRUE 
10 A10   G     NA    NA   

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 R 中的另一个数据框匹配列的值并使用 dplyr 打印消息? - How can I match the values of a column according to another data frame in R and print a message using dplyr? 如何根据使用 Dplyr 的另一个数据帧的值计算 R 数据帧中所有列的方差? - How I can calculate the variance across all columns in a data frame in R according to the values of another data frame using Dplyr? 如果向量的值对应(存在)到数据框的列,我如何使用 dplyr 检查 R? - How can I check in R using dplyr the values of a vector if they correspond (exist) to a column of a data frame? 我如何在 R 中创建一个新列,该列将使用 dplyr 根据 R 中的初始值返回特定值? - How can i create a new column in R that will return specific values according to the initial values in R using dplyr? 如何根据感兴趣变量的日期匹配多个变量的值,并使用 dplyr 在 R 中单独汇总它们? - How can I match the values of multiple variables according to the dates of a variable of interest and summarise them alone in R using dplyr? 使用R根据数据帧中的列的值的频率对数据进行分组 - Group data according to frequency of values in a column in a data frame using R R:根据 dplyr 的列值打破 data.frame - R: Break a data.frame according to value of column with dplyr 如何使用 dplyr 根据 R 中其他列的条件更改多列的值? - How I can changes the values of multiple columns according to a condition of other columns in R using dplyr? R根据另一个数据框的精确匹配替换列的值 - R replace values of a column based on exact match of another data frame 如何解决数据帧中的编码,而不考虑R中的行或列(使用dplyr)? - How do I fix encoding in a data frame regardless of its row or column in R(using dplyr)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM