[英]Remove duplicates based on order
In R, I'm looking to remove any instances after the first two b
and c
after each a
(please note the numbering). 在R中,我希望删除每个
a
之后的前两个b
和c
之后的所有实例(请注意编号)。 I've got the following: 我有以下几点:
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 b
11 c
12 a
13 b
14 c
15 c
I'm looking to reduce it to: 我希望将其减少为:
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
12 a
13 b
14 c
I'm trying to do this within a dplyr
pipe if possible. 我正在尝试在
dplyr
管道中执行此操作。 Any ideas? 有任何想法吗?
How about this? 这个怎么样?
d <- data.frame(lets = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "b", "c", "a", "b", "c", "c"))
d %>%
mutate(lag1 = lag(lets),
lag2 = lag(lag1)) %>%
filter(is.na(lag2) |
!(lets == lag1 | lets == lag2 | lag1 == lag2)) %>%
select(lets)
lets
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 a
11 b
12 c
One possible solution: 一种可能的解决方案:
df = read.table(text="1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 b
11 c
12 a
13 b
14 c
15 c",header=F)
library(dplyr)
df %>% mutate(x=cumsum(V2=='a')) %>%
group_by(x) %>%
filter(!duplicated(V2)) %>%
ungroup() %>%
select(-x)
Output: 输出:
# A tibble: 12 x 2
V1 V2
<int> <fctr>
1 1 a
2 2 b
3 3 c
4 4 a
5 5 b
6 6 c
7 7 a
8 8 b
9 9 c
10 12 a
11 13 b
12 14 c
Note that this removes all duplicated elements every time after an a
is encountered. 请注意,这每次遇到
a
后都会删除所有重复的元素。 If you only want to remove duplicated b
's and c
's, consider : filter(!(duplicated(V2) & (V2=='b' | V2=='c')))
如果只想删除重复的
b
和c
,请考虑: filter(!(duplicated(V2) & (V2=='b' | V2=='c')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.