[英]Reference column in another data frame by row entry
I have DF1 like this: 我有这样的DF1:
ID Name Team
222717 Bob Badgers
321817 James Tigers
521917 Eric Possums
And DF2 like this: DF2像这样:
Badgers Tigers Possums
222717 438283 521917
789423 978748 251233
I want to check if the ID in DF1 appears in the corresponding team name in DF2. 我想检查DF1中的ID是否出现在DF2中的相应团队名称中。 For example, in the first row, Bob's ID does appear under his team name, "Badgers," in DF2. 例如,在第一行中,Bob的ID确实出现在DF2的团队名称“ Badgers”下。 James' ID does not appear under his team name, "Tigers," in DF2. 在DF2中,James的ID没有出现在他的团队名称“ Tigers”下。 I was thinking of adding a column that marks whether it appears or not, but can't figure out how to reference the column in DF2. 我当时在考虑添加一列标记它是否出现,但无法弄清楚如何在DF2中引用该列。 Here's what I tried. 这是我尝试过的。
test <- mutate(DF1,validID=ifelse(ID%in%DF2$DF1$Team,"Yes",NA))
The DF2$DF1$Team
part is where I'm stuck. DF2$DF1$Team
部分是我遇到的问题。 How do I reference the column in DF2 that corresponds to the team listed in DF1? 如何引用DF2中与DF1中列出的小组相对应的列? Also open to alternative suggestions on how to manipulate the data to achieve this task. 还开放了关于如何处理数据以实现此任务的替代建议。
The %in%
function is a compact way to access the match
function. %in%
函数是访问match
函数的一种紧凑方式。 mapply
is the canonical method to supply multiple columns for evaluation of their corresponding values in sequence. mapply
是规范方法,可提供多个列以按顺序评估它们的对应值。
DF1$right2 <- mapply( function(a,b) {a %in% DF2[[b]]}, a=DF1$ID, b=as.character(DF1$Team) )
#============
> DF1
ID Name Team right2
1 222717 Bob Badgers TRUE
2 321817 James Tigers FALSE
3 521917 Eric Possums TRUE
Honestly I find mapply
hard to conceptualise, and in any case 42's answer seems to return FALSE for Eric, when it ought to return true. 老实说,我觉得mapply
很难概念化,在任何情况下42的回答似乎埃里克返回FALSE,当它应该返回true。 Most likely a typo, but for future reference it's helpful to give your sample data in a format that lets you just copy the code and create the right objects! 最有可能是错字,但对于将来的参考,将示例数据提供给您的格式很有用,该格式允许您仅复制代码并创建正确的对象!
This is a quick way of doing it avoiding map
or apply
functions, with only tidyverse
tools (and a magrittr
alias, but you can sub that out). 这是一种仅使用tidyverse
工具(和magrittr
别名,但可以将其删除)来避免map
或apply
函数的快速方法。 Here I split the "finding the right column" and "checking if ID is there" into two steps, but you could combine if you wanted. 在这里,我将“查找右列”和“检查ID是否存在”分为两个步骤,但是如果需要,可以合并。
library(tidyverse)
library(magrittr)
df1 <- tibble(ID = c(222717, 321817, 521917),
Name = c("Bob", "James", "Eric"),
Team = c("Badgers", "Tigers", "Possums")
)
df2 <- tibble(Badgers = c(222717, 789423),
Tigers = c(438283, 978748),
Possums = c(521917, 251233)
)
df1 %>%
mutate(team_col = colnames(df2) %>% equals(Team) %>% which()) %>%
mutate(id_exists_for_team = ID %in% as_vector(df2[team_col]))
#> # A tibble: 3 x 5
#> ID Name Team team_col id_exists_for_team
#> <dbl> <chr> <chr> <int> <lgl>
#> 1 222717 Bob Badgers 1 TRUE
#> 2 321817 James Tigers 2 FALSE
#> 3 521917 Eric Possums 3 TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.