So I have a vector that looks like this:
x <- c(1,1,1,3,4,5,6,7,7,7,7)
I know about the duplicate function, but I want R to return me a boolean vector after the nth duplicated item. So let's say I am interested to know the 3rd number that is duplicated (or more), such that the return is:
FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
One possibility could be:
ave(duplicated(x), x, FUN = cumsum) >= 2
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
If the runs of the elements could be repeated:
x <- c(1,1,1,3,4,5,6,7,7,7,7,1,1,1)
rleid <- with(rle(x), rep(seq_along(values), lengths))
ave(duplicated(rleid), rleid, FUN = cumsum) >= 2
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
[13] FALSE TRUE
We can use ave
n <- 3
ave(x, x, FUN = seq_along) >= n
# [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
Translation to dplyr
would be
library(dplyr)
data.frame(x) %>%
group_by(x) %>%
mutate(dup = row_number() >= n)
We can use data.table
library(data.table)
n <- 3
as.data.table(x)[, dup := seq_len(.N) >= n, x]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.