简体   繁体   English

如果value在另一个数据帧中,请用NA替换多个列

[英]If value is in another dataframe, replace multiple columns with NA

I have two data.frames like this: 我有两个data.frames像这样:

#df1
ID     a1      a2     a3      b1      b2      b3     Date
3xy    Evan    Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    Mark    Seth   12/5

#df2
Name
Evan
Mark

If a name from any of the "a" columns appears in df2$Name, I want to replace all of the "a" columns with NA. 如果任何“a”列中的名称出现在df2 $ Name中,我想用NA替换所有“a”列。 Same for the "b" columns. “b”列相同。 My desired output would look like this: 我想要的输出看起来像这样:

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      NA     NA     Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    NA       NA      NA     12/5

I've found several other posts that appear to be on similar topics, but I haven't found a way to get this to work. 我发现其他一些帖子看起来似乎有类似的主题,但我还没有找到办法让它发挥作用。 I've been able to replace the names in df1 that appear in df2 with NA using the code below, but haven't figured out how to replace the other columns that begin with the same letter: 我已经能够使用下面的代码替换df2中出现在df2中的名称,但是还没有弄清楚如何替换以相同字母开头的其他列:

df1[apply(df1, 2, function(df1) df1 %in% df2$Name)] <- NA

Gives me an output like this: 给我这样的输出:

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    NA      Seth   12/5

I also keep trying different ifelse statements, but no success. 我也在不断尝试不同的ifelse陈述,但没有成功。

We can split the dataset based on the 'a', and 'b' columns, then loop through rows and assign the rows to NA values if there is any match with the 'name' column of 'df2' 我们可以split ,如果有通过行基础上的“a”的数据集,以及“B”柱,然后循环并分配行NA值any比赛与“DF2”的“名称”列

nm1 <- names(df1)[c(-1, -8)]
lst <- lapply(split.default(df1[nm1], sub("\\d+", "", nm1)), function(x) {
         x[apply(x, 1, function(y) any(y %in% df2$Name)),] <- NA
     x})
df1[nm1] <- do.call(cbind, unname(lst))
df1
#   ID   a1   a2   a3   b1   b2   b3 Date
#1 3xy <NA> <NA> <NA>  Ben  Bob Alex 12/3
#2 4lm John Bill  Sue <NA> <NA> <NA> 12/5

Or another option is melt/dcast from data.table 或者另一个选项是melt/dcast data.table

library(data.table)
dcast(melt(setDT(df1), measure = patterns("^a\\d+", "^b\\d+"),
    value.name = c('a', 'b'))[, c('a', 'b') := lapply(.SD, function(x) 
  replace(x, any(x %in% df2$Name), NA)), ID, .SDcols = a:b][],
        ID + Date ~ variable, value.var = c('a', 'b'), sep='')
#    ID Date   a1   a2  a3  b1  b2   b3
#1: 3xy 12/3   NA   NA  NA Ben Bob Alex
#2: 4lm 12/5 John Bill Sue  NA  NA   NA
library(tidyverse)
df3 <- df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  select(group, ID, value) %>%
  inner_join(df2, by = c("value" = "Name")) %>%
  select(group, ID)

df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  anti_join(df3) %>%
  select(-group) %>%
  spread(key, value) %>%
  select(ID, matches("^a"), matches("^b"), Date)

Output: 输出:

# A tibble: 2 x 8
     ID    a1    a2    a3    b1    b2    b3  Date
* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   3xy  <NA>  <NA>  <NA>   Ben   Bob  Alex  12/3
2   4lm  John  Bill   Sue  <NA>  <NA>  <NA>  12/5

Here is a dplyr/tidyr approach 这是一个dplyr / tidyr方法

library(dplyr)
library(tidyr)

df1= df1%>% gather(Type, Names, -c(ID, Date)) %>%
  mutate(type2 = gsub("\\d", "", Type)) %>%
  group_by(type2, ID) %>%
  mutate(names2 = ifelse(any(Names %in% df2$Name), "", Names),
         Names = ifelse(names2 == "", NA, Names)) %>%
  ungroup() %>%
  select(-type2, -names2) 

which results in a (long format) 导致(长格式)

       ID   Date  Type Names
   <fctr> <fctr> <chr> <chr>
 1  3xy     12/3    a1  <NA>
 2  4lm     12/5    a1  John
 3  3xy     12/3    a2  <NA>
 4  4lm     12/5    a2  Bill
 5  3xy     12/3    a3  <NA>
 6  4lm     12/5    a3   Sue
 7  3xy     12/3    b1   Ben
 8  4lm     12/5    b1  <NA>
 9  3xy     12/3    b2   Bob
10  4lm     12/5    b2  <NA>
11  3xy     12/3    b3  Alex
12  4lm     12/5    b3  <NA>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将DataFrame中的多个因子变量替换为NA值 - Replace NA value for multiple factor variables in DataFrame 如何根据另一个 dataframe 中的变量将矩阵中的所有列替换为 NA? - How to replace all columns in matrix with NA based on variable in another dataframe? 用其他数据框的平均值替换多列的NA - Replace NA for multiple columns with average of values from other dataframe 将单元格的NA值替换为同一数据帧中另一列的值 - Replace the NA value of a cell by the value of another column in the same dataframe 如何用相应列的值替换多列中的 NA - How to replace NA in multiple columns with value from corresponding columns 仅当另外两个列也为NA时,才用值替换Dataframe列中的NA - Replace NA in a Dataframe Column with a Value Only when Two Other Columns Are Also NA 根据不同列中的值将多列中的值替换为 NA - Replace values in multiple columns with NA based on value in a different column 在跨多列的循环中用组值替换 NA - Replace NA with group value in loop across multiple columns 检查多个列的值,如果在R中找不到,则用NA替换 - Check multiple columns for value, replace with NA if not found in R 使用公式将 ifelse() 条件应用于数据框中的多列,以获取 NA 的值,该公式使用链接到列中另一个值的公式 - Apply ifelse() condition to multiple columns in dataframe for values of NA using a formula that uses a formula linking to another value from a column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM