如果value在另一个数据帧中，请用NA替换多个列

Question

I have two data.frames like this: 我有两个data.frames像这样：

#df1
ID     a1      a2     a3      b1      b2      b3     Date
3xy    Evan    Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    Mark    Seth   12/5

#df2
Name
Evan
Mark

If a name from any of the "a" columns appears in df2$Name, I want to replace all of the "a" columns with NA. 如果任何“a”列中的名称出现在df2 $ Name中，我想用NA替换所有“a”列。 Same for the "b" columns. “b”列相同。 My desired output would look like this: 我想要的输出看起来像这样：

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      NA     NA     Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    NA       NA      NA     12/5

I've found several other posts that appear to be on similar topics, but I haven't found a way to get this to work. 我发现其他一些帖子看起来似乎有类似的主题，但我还没有找到办法让它发挥作用。 I've been able to replace the names in df1 that appear in df2 with NA using the code below, but haven't figured out how to replace the other columns that begin with the same letter: 我已经能够使用下面的代码替换df2中出现在df2中的名称，但是还没有弄清楚如何替换以相同字母开头的其他列：

df1[apply(df1, 2, function(df1) df1 %in% df2$Name)] <- NA

Gives me an output like this: 给我这样的输出：

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    NA      Seth   12/5

I also keep trying different ifelse statements, but no success. 我也在不断尝试不同的ifelse陈述，但没有成功。

Answer 1

We can split the dataset based on the 'a', and 'b' columns, then loop through rows and assign the rows to NA values if there is any match with the 'name' column of 'df2' 我们可以split ，如果有通过行基础上的“a”的数据集，以及“B”柱，然后循环并分配行NA值any比赛与“DF2”的“名称”列

nm1 <- names(df1)[c(-1, -8)]
lst <- lapply(split.default(df1[nm1], sub("\\d+", "", nm1)), function(x) {
         x[apply(x, 1, function(y) any(y %in% df2$Name)),] <- NA
     x})
df1[nm1] <- do.call(cbind, unname(lst))
df1
#   ID   a1   a2   a3   b1   b2   b3 Date
#1 3xy <NA> <NA> <NA>  Ben  Bob Alex 12/3
#2 4lm John Bill  Sue <NA> <NA> <NA> 12/5

Or another option is melt/dcast from data.table 或者另一个选项是melt/dcast data.table

library(data.table)
dcast(melt(setDT(df1), measure = patterns("^a\\d+", "^b\\d+"),
    value.name = c('a', 'b'))[, c('a', 'b') := lapply(.SD, function(x) 
  replace(x, any(x %in% df2$Name), NA)), ID, .SDcols = a:b][],
        ID + Date ~ variable, value.var = c('a', 'b'), sep='')
#    ID Date   a1   a2  a3  b1  b2   b3
#1: 3xy 12/3   NA   NA  NA Ben Bob Alex
#2: 4lm 12/5 John Bill Sue  NA  NA   NA

Answer 2

library(tidyverse)
df3 <- df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  select(group, ID, value) %>%
  inner_join(df2, by = c("value" = "Name")) %>%
  select(group, ID)

df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  anti_join(df3) %>%
  select(-group) %>%
  spread(key, value) %>%
  select(ID, matches("^a"), matches("^b"), Date)

Output: 输出：

# A tibble: 2 x 8
     ID    a1    a2    a3    b1    b2    b3  Date
* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   3xy  <NA>  <NA>  <NA>   Ben   Bob  Alex  12/3
2   4lm  John  Bill   Sue  <NA>  <NA>  <NA>  12/5

Answer 3

Here is a dplyr/tidyr approach 这是一个dplyr / tidyr方法

library(dplyr)
library(tidyr)

df1= df1%>% gather(Type, Names, -c(ID, Date)) %>%
  mutate(type2 = gsub("\\d", "", Type)) %>%
  group_by(type2, ID) %>%
  mutate(names2 = ifelse(any(Names %in% df2$Name), "", Names),
         Names = ifelse(names2 == "", NA, Names)) %>%
  ungroup() %>%
  select(-type2, -names2)

which results in a (long format) 导致（长格式）

       ID   Date  Type Names
   <fctr> <fctr> <chr> <chr>
 1  3xy     12/3    a1  <NA>
 2  4lm     12/5    a1  John
 3  3xy     12/3    a2  <NA>
 4  4lm     12/5    a2  Bill
 5  3xy     12/3    a3  <NA>
 6  4lm     12/5    a3   Sue
 7  3xy     12/3    b1   Ben
 8  4lm     12/5    b1  <NA>
 9  3xy     12/3    b2   Bob
10  4lm     12/5    b2  <NA>
11  3xy     12/3    b3  Alex
12  4lm     12/5    b3  <NA>

如果value在另一个数据帧中，请用NA替换多个列

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-12-07 16:46:42

解决方案2
1 2017-12-07 16:53:01

解决方案3
0 2017-12-07 17:24:51

如果value在另一个数据帧中，请用NA替换多个列

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-12-07 16:46:42

解决方案2 1 2017-12-07 16:53:01

解决方案3 0 2017-12-07 17:24:51

解决方案1
2 已采纳 2017-12-07 16:46:42

解决方案2
1 2017-12-07 16:53:01

解决方案3
0 2017-12-07 17:24:51