简体   繁体   中英

R : Finding the nth duplicated item

So I have a vector that looks like this:

x <- c(1,1,1,3,4,5,6,7,7,7,7)

I know about the duplicate function, but I want R to return me a boolean vector after the nth duplicated item. So let's say I am interested to know the 3rd number that is duplicated (or more), such that the return is:

FALSE  FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  FALSE TRUE  TRUE

One possibility could be:

ave(duplicated(x), x, FUN = cumsum) >= 2

 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

If the runs of the elements could be repeated:

x <- c(1,1,1,3,4,5,6,7,7,7,7,1,1,1)

rleid <- with(rle(x), rep(seq_along(values), lengths))
ave(duplicated(rleid), rleid, FUN = cumsum) >= 2

 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
[13] FALSE  TRUE

We can use ave

n <- 3
ave(x, x, FUN = seq_along) >= n
# [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

Translation to dplyr would be

library(dplyr)

data.frame(x) %>%
   group_by(x) %>%
   mutate(dup = row_number() >= n)

We can use data.table

library(data.table)
n <- 3
as.data.table(x)[, dup := seq_len(.N) >= n, x]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM