简体   繁体   English

R 仅删除一组重复项

[英]R Remove duplicates only for one group

I looked through duplicates answers but couldn't find similar case to this.我查看了重复的答案,但找不到与此类似的案例。 I want to remove duplicates only for one group and keep rest as it is.我只想删除一组的重复项并保持 rest 原样。 Can this be achieved with out creating temporary table?这可以在不创建临时表的情况下实现吗?

Example:例子:

I want to remove duplicates only for ID == "B", I don't care if there are duplicates in other IDs.我只想删除 ID == "B" 的重复项,我不在乎其他 ID 中是否有重复项。

library(dplyr)

dt <- tibble(ID = rep(LETTERS[1:3], 3),
       VAL = rep(1:3, 3),
       VAL2 = rep(1:3, 3)) %>% 
  arrange(ID)

What I normally use to find duplicates in two columns.我通常用来在两列中查找重复项。

dt %>% 
  group_by(ID) %>% 
  distinct(VAL, VAL2, .keep_all = T)

This will ofcourse find all duplicates, I can use filter and create a new table and then work from there, but looking way to remove ID == "B" duplicates and not touch other IDs.这当然会找到所有重复项,我可以使用过滤器并创建一个新表,然后从那里开始工作,但是寻找删除 ID == "B" 重复项而不触及其他 ID 的方法。 Can this be achieve without creating temp table?这可以在不创建临时表的情况下实现吗?

My current workflow我目前的工作流程

B <- dt %>% 
  filter(ID == "B") %>% 
  distinct(VAL, VAL2, .keep_all = T)


dt %>% 
  filter(ID != "B") %>% 
  bind_rows(B)

# A tibble: 7 x 3
  ID      VAL  VAL2
  <chr> <int> <int>
1 A         1     1
2 A         1     1
3 A         1     1
4 C         3     3
5 C         3     3
6 C         3     3
7 B         2     2

You can use negative subsetting asking for duplicated for ID=="B" .您可以使用负子集来要求重复ID=="B"

i <- which(dt$ID == "B")
dt[-i[duplicated(dt[i,])],]
#dt[-i[duplicated(dt[i,c("VAL", "VAL2")])],] #Alternative limiting to VAL and VAL2
#  ID      VAL  VAL2
#  <chr> <int> <int>
#1 A         1     1
#2 A         1     1
#3 A         1     1
#4 B         2     2
#5 C         3     3
#6 C         3     3
#7 C         3     3

Or you can use rbind after sub-setting dt and using unique on the selected rows.或者您可以在子设置dt并在所选行上使用unique之后使用rbind

i <- dt$ID == "B"
rbind(dt[!i,], unique(dt[i,]))
#  ID      VAL  VAL2
#  <chr> <int> <int>
#1 A         1     1
#2 A         1     1
#3 A         1     1
#4 C         3     3
#5 C         3     3
#6 C         3     3
#7 B         2     2

It certainly can be done similarly as you already did:它当然可以像你已经做过的那样做:

dt %>% 
  filter(ID == "B") %>% 
  distinct(VAL, VAL2, .keep_all = T) %>%
  bind_rows(dt %>% filter(ID != "B"))

It can also be achieved by using data.table :也可以通过使用data.table来实现:

library(data.table)
setDT(dt)
dt[ID != "B" | !duplicated(dt, by=c("ID", "VAL", "VAL2"))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM