[英]R Remove duplicates only for one group
I looked through duplicates answers but couldn't find similar case to this.我查看了重复的答案,但找不到与此类似的案例。 I want to remove duplicates only for one group and keep rest as it is.
我只想删除一组的重复项并保持 rest 原样。 Can this be achieved with out creating temporary table?
这可以在不创建临时表的情况下实现吗?
Example:例子:
I want to remove duplicates only for ID == "B", I don't care if there are duplicates in other IDs.我只想删除 ID == "B" 的重复项,我不在乎其他 ID 中是否有重复项。
library(dplyr)
dt <- tibble(ID = rep(LETTERS[1:3], 3),
VAL = rep(1:3, 3),
VAL2 = rep(1:3, 3)) %>%
arrange(ID)
What I normally use to find duplicates in two columns.我通常用来在两列中查找重复项。
dt %>%
group_by(ID) %>%
distinct(VAL, VAL2, .keep_all = T)
This will ofcourse find all duplicates, I can use filter and create a new table and then work from there, but looking way to remove ID == "B" duplicates and not touch other IDs.这当然会找到所有重复项,我可以使用过滤器并创建一个新表,然后从那里开始工作,但是寻找删除 ID == "B" 重复项而不触及其他 ID 的方法。 Can this be achieve without creating temp table?
这可以在不创建临时表的情况下实现吗?
My current workflow我目前的工作流程
B <- dt %>%
filter(ID == "B") %>%
distinct(VAL, VAL2, .keep_all = T)
dt %>%
filter(ID != "B") %>%
bind_rows(B)
# A tibble: 7 x 3
ID VAL VAL2
<chr> <int> <int>
1 A 1 1
2 A 1 1
3 A 1 1
4 C 3 3
5 C 3 3
6 C 3 3
7 B 2 2
You can use negative subsetting asking for duplicated for ID=="B"
.您可以使用负子集来要求重复
ID=="B"
。
i <- which(dt$ID == "B")
dt[-i[duplicated(dt[i,])],]
#dt[-i[duplicated(dt[i,c("VAL", "VAL2")])],] #Alternative limiting to VAL and VAL2
# ID VAL VAL2
# <chr> <int> <int>
#1 A 1 1
#2 A 1 1
#3 A 1 1
#4 B 2 2
#5 C 3 3
#6 C 3 3
#7 C 3 3
Or you can use rbind
after sub-setting dt
and using unique
on the selected rows.或者您可以在子设置
dt
并在所选行上使用unique
之后使用rbind
。
i <- dt$ID == "B"
rbind(dt[!i,], unique(dt[i,]))
# ID VAL VAL2
# <chr> <int> <int>
#1 A 1 1
#2 A 1 1
#3 A 1 1
#4 C 3 3
#5 C 3 3
#6 C 3 3
#7 B 2 2
It certainly can be done similarly as you already did:它当然可以像你已经做过的那样做:
dt %>%
filter(ID == "B") %>%
distinct(VAL, VAL2, .keep_all = T) %>%
bind_rows(dt %>% filter(ID != "B"))
It can also be achieved by using data.table
:也可以通过使用
data.table
来实现:
library(data.table)
setDT(dt)
dt[ID != "B" | !duplicated(dt, by=c("ID", "VAL", "VAL2"))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.