[英]If value is in another dataframe, replace multiple columns with NA
I have two data.frames like this: 我有两个data.frames像这样:
#df1
ID a1 a2 a3 b1 b2 b3 Date
3xy Evan Greg Ryan Ben Bob Alex 12/3
4lm John Bill Sue Randy Mark Seth 12/5
#df2
Name
Evan
Mark
If a name from any of the "a" columns appears in df2$Name, I want to replace all of the "a" columns with NA. 如果任何“a”列中的名称出现在df2 $ Name中,我想用NA替换所有“a”列。 Same for the "b" columns.
“b”列相同。 My desired output would look like this:
我想要的输出看起来像这样:
ID a1 a2 a3 b1 b2 b3 Date
3xy NA NA NA Ben Bob Alex 12/3
4lm John Bill Sue NA NA NA 12/5
I've found several other posts that appear to be on similar topics, but I haven't found a way to get this to work. 我发现其他一些帖子看起来似乎有类似的主题,但我还没有找到办法让它发挥作用。 I've been able to replace the names in df1 that appear in df2 with NA using the code below, but haven't figured out how to replace the other columns that begin with the same letter:
我已经能够使用下面的代码替换df2中出现在df2中的名称,但是还没有弄清楚如何替换以相同字母开头的其他列:
df1[apply(df1, 2, function(df1) df1 %in% df2$Name)] <- NA
Gives me an output like this: 给我这样的输出:
ID a1 a2 a3 b1 b2 b3 Date
3xy NA Greg Ryan Ben Bob Alex 12/3
4lm John Bill Sue Randy NA Seth 12/5
I also keep trying different ifelse
statements, but no success. 我也在不断尝试不同的
ifelse
陈述,但没有成功。
We can split
the dataset based on the 'a', and 'b' columns, then loop through rows and assign the rows to NA values if there is any
match with the 'name' column of 'df2' 我们可以
split
,如果有通过行基础上的“a”的数据集,以及“B”柱,然后循环并分配行NA值any
比赛与“DF2”的“名称”列
nm1 <- names(df1)[c(-1, -8)]
lst <- lapply(split.default(df1[nm1], sub("\\d+", "", nm1)), function(x) {
x[apply(x, 1, function(y) any(y %in% df2$Name)),] <- NA
x})
df1[nm1] <- do.call(cbind, unname(lst))
df1
# ID a1 a2 a3 b1 b2 b3 Date
#1 3xy <NA> <NA> <NA> Ben Bob Alex 12/3
#2 4lm John Bill Sue <NA> <NA> <NA> 12/5
Or another option is melt/dcast
from data.table
或者另一个选项是
melt/dcast
data.table
library(data.table)
dcast(melt(setDT(df1), measure = patterns("^a\\d+", "^b\\d+"),
value.name = c('a', 'b'))[, c('a', 'b') := lapply(.SD, function(x)
replace(x, any(x %in% df2$Name), NA)), ID, .SDcols = a:b][],
ID + Date ~ variable, value.var = c('a', 'b'), sep='')
# ID Date a1 a2 a3 b1 b2 b3
#1: 3xy 12/3 NA NA NA Ben Bob Alex
#2: 4lm 12/5 John Bill Sue NA NA NA
library(tidyverse)
df3 <- df1 %>%
gather(key, value, -ID, -Date) %>%
mutate(group = substr(key, 1, 1)) %>%
select(group, ID, value) %>%
inner_join(df2, by = c("value" = "Name")) %>%
select(group, ID)
df1 %>%
gather(key, value, -ID, -Date) %>%
mutate(group = substr(key, 1, 1)) %>%
anti_join(df3) %>%
select(-group) %>%
spread(key, value) %>%
select(ID, matches("^a"), matches("^b"), Date)
Output: 输出:
# A tibble: 2 x 8
ID a1 a2 a3 b1 b2 b3 Date
* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3xy <NA> <NA> <NA> Ben Bob Alex 12/3
2 4lm John Bill Sue <NA> <NA> <NA> 12/5
Here is a dplyr/tidyr approach 这是一个dplyr / tidyr方法
library(dplyr)
library(tidyr)
df1= df1%>% gather(Type, Names, -c(ID, Date)) %>%
mutate(type2 = gsub("\\d", "", Type)) %>%
group_by(type2, ID) %>%
mutate(names2 = ifelse(any(Names %in% df2$Name), "", Names),
Names = ifelse(names2 == "", NA, Names)) %>%
ungroup() %>%
select(-type2, -names2)
which results in a (long format) 导致(长格式)
ID Date Type Names
<fctr> <fctr> <chr> <chr>
1 3xy 12/3 a1 <NA>
2 4lm 12/5 a1 John
3 3xy 12/3 a2 <NA>
4 4lm 12/5 a2 Bill
5 3xy 12/3 a3 <NA>
6 4lm 12/5 a3 Sue
7 3xy 12/3 b1 Ben
8 4lm 12/5 b1 <NA>
9 3xy 12/3 b2 Bob
10 4lm 12/5 b2 <NA>
11 3xy 12/3 b3 Alex
12 4lm 12/5 b3 <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.