根据另一列的特定类别删除重复项

Question

I would like to remove duplicate IDs in my data using the Category columns.我想使用Category列删除数据中的重复IDs 。 A subset of my data is as follows:我的数据的一个子集如下：

df <- data.frame(ID=c(1,2,3,4,1,4,2),
                 category=c("a","b","c","d","b","a","a"))
df

  ID category
1  1        a
2  2        b
3  3        c
4  4        d
5  1        b
6  4        a
7  2        a

If there is a duplicated ID from Category b , I need to keep it and remove the corresponding ID from other categories.如果Category b有重复的ID ，我需要保留它并从其他类别中删除相应的 ID。 And, I have no priority if the duplicated IDs are form other categories excluding Category b .而且，如果重复的IDs来自除Category b之外的其他类别，我没有优先权。 So, my favorite outcome is:所以，我最喜欢的结果是：

  ID category
1  2        b
2  3        c
3  4        d
4  1        b

I have already read this post : R: Remove duplicates from a dataframe based on categories in a column but can't find my answer我已经阅读了这篇文章： R：根据列中的类别从数据框中删除重复项但找不到我的答案

Answer 1

We could do an arrange to that 'b' category rows are arranged at the top and then get the distinct rows by 'ID'我们可以arrange将 'b' 类别行排列在顶部，然后通过 'ID' 获取distinct行

library(dplyr)
df %>%
     arrange(category != 'b') %>% 
     distinct(ID, .keep_all = TRUE)

-output -输出

  ID category
1  2        b
2  1        b
3  3        c
4  4        d

Or using base R或使用base R

df[order(df$category != 'b'), ] -> df1
df1[!duplicated(df1$ID), ]

Answer 2

In base R you could do:在基础 R 中，您可以执行以下操作：

 subset(df, !category %in% category[ID %in% ID[category == 'b'] & category !='b'])
  ID category
1  2        b
2  3        c
3  4        d
4  1        b

根据另一列的特定类别删除重复项

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-06-29 17:27:03

解决方案2
0 2021-06-29 17:49:05

根据另一列的特定类别删除重复项

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-06-29 17:27:03

解决方案2 0 2021-06-29 17:49:05

解决方案1
1 已采纳 2021-06-29 17:27:03

解决方案2
0 2021-06-29 17:49:05