R 仅删除一组重复项

Question

I looked through duplicates answers but couldn't find similar case to this.我查看了重复的答案，但找不到与此类似的案例。 I want to remove duplicates only for one group and keep rest as it is.我只想删除一组的重复项并保持 rest 原样。 Can this be achieved with out creating temporary table?这可以在不创建临时表的情况下实现吗？

Example:例子：

I want to remove duplicates only for ID == "B", I don't care if there are duplicates in other IDs.我只想删除 ID == "B" 的重复项，我不在乎其他 ID 中是否有重复项。

library(dplyr)

dt <- tibble(ID = rep(LETTERS[1:3], 3),
       VAL = rep(1:3, 3),
       VAL2 = rep(1:3, 3)) %>% 
  arrange(ID)

What I normally use to find duplicates in two columns.我通常用来在两列中查找重复项。

dt %>% 
  group_by(ID) %>% 
  distinct(VAL, VAL2, .keep_all = T)

This will ofcourse find all duplicates, I can use filter and create a new table and then work from there, but looking way to remove ID == "B" duplicates and not touch other IDs.这当然会找到所有重复项，我可以使用过滤器并创建一个新表，然后从那里开始工作，但是寻找删除 ID == "B" 重复项而不触及其他 ID 的方法。 Can this be achieve without creating temp table?这可以在不创建临时表的情况下实现吗？

My current workflow我目前的工作流程

B <- dt %>% 
  filter(ID == "B") %>% 
  distinct(VAL, VAL2, .keep_all = T)


dt %>% 
  filter(ID != "B") %>% 
  bind_rows(B)

# A tibble: 7 x 3
  ID      VAL  VAL2
  <chr> <int> <int>
1 A         1     1
2 A         1     1
3 A         1     1
4 C         3     3
5 C         3     3
6 C         3     3
7 B         2     2

Answer 1

You can use negative subsetting asking for duplicated for ID=="B" .您可以使用负子集来要求重复ID=="B" 。

i <- which(dt$ID == "B")
dt[-i[duplicated(dt[i,])],]
#dt[-i[duplicated(dt[i,c("VAL", "VAL2")])],] #Alternative limiting to VAL and VAL2
#  ID      VAL  VAL2
#  <chr> <int> <int>
#1 A         1     1
#2 A         1     1
#3 A         1     1
#4 B         2     2
#5 C         3     3
#6 C         3     3
#7 C         3     3

Or you can use rbind after sub-setting dt and using unique on the selected rows.或者您可以在子设置dt并在所选行上使用unique之后使用rbind 。

i <- dt$ID == "B"
rbind(dt[!i,], unique(dt[i,]))
#  ID      VAL  VAL2
#  <chr> <int> <int>
#1 A         1     1
#2 A         1     1
#3 A         1     1
#4 C         3     3
#5 C         3     3
#6 C         3     3
#7 B         2     2

Answer 2

It certainly can be done similarly as you already did:它当然可以像你已经做过的那样做：

dt %>% 
  filter(ID == "B") %>% 
  distinct(VAL, VAL2, .keep_all = T) %>%
  bind_rows(dt %>% filter(ID != "B"))

It can also be achieved by using data.table :也可以通过使用data.table来实现：

library(data.table)
setDT(dt)
dt[ID != "B" | !duplicated(dt, by=c("ID", "VAL", "VAL2"))]

R 仅删除一组重复项

问题描述

2 个解决方案

解决方案1
0 2020-12-14 10:44:03

解决方案2
0 已采纳 2020-12-14 10:51:56

R 仅删除一组重复项

问题描述

2 个解决方案

解决方案1 0 2020-12-14 10:44:03

解决方案2 0 已采纳 2020-12-14 10:51:56

解决方案1
0 2020-12-14 10:44:03

解决方案2
0 已采纳 2020-12-14 10:51:56