After searching for a while, I know that this question has not been answered yet. Assume that I have the following vector
v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
How do I find those values having more than 1 duplicates
(should be "c","c","c", "d", "d", "d", "d")
and more than 2 duplicates
(should be "d", "d", "d", "d"
)
Function duplicated(v)
only returns values having duplicates.
You can generate a table()
and then check which elements of v
are part of the relevant subset of the table, eg
R> v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
R> tab <- table(v)
R> tab
v
a b c d
1 2 3 4
R> v[v %in% names(tab[tab > 2])]
[1] "c" "c" "c" "d" "d" "d" "d"
R> v[v %in% names(tab[tab > 3])]
[1] "d" "d" "d" "d"
I would use ave
to write a simple function like this:
myFun <- function(vector, thresh) {
ind <- ave(rep(1, length(vector)), vector, FUN = length)
vector[ind > thresh + 1] ## added "+1" to match your terminology
}
Here it is applied to "v":
myFun(v, 1)
# [1] "c" "c" "c" "d" "d" "d" "d"
myFun(v, 2)
# [1] "d" "d" "d" "d"
Of course, there is always "data.table":
as.data.table(v)[, N := .N, by = v][N > 1 + 1]$v
# [1] "c" "c" "c" "d" "d" "d" "d"
as.data.table(v)[, N := .N, by = v][N > 2 + 1]$v
# [1] "d" "d" "d" "d"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.