Removing infrequent rows in a data frame

Question

Let's say I have a following very simple data frame:

a <- rep(5,30)
b <- rep(4,80)
d <- rep(7,55)

df <- data.frame(Column = c(a,b,d))

What would be the most generic way for removing all rows with the value that appear less then 60 times?

I know you could say "in this case it's just a", but in my real data there are many more frequencies, so I wouldn't want to specify them one by one.

I was thinking of writing a loop such that if length() of an 'i' is smaller than 60, these rows will be deleted, but perhaps you have other ideas. Thanks in advance.

Answer 1

A solution using dplyr .

library(dplyr)

df2 <- df %>%
  group_by(Column) %>%
  filter(n() >= 60)

Or a solution from base R

uniqueID <- unique(df$Column)
targetID <- sapply(split(df, df$Column), function(x) nrow(x) >= 60)

df2 <- df[df$Column %in% uniqueID[targetID], , drop = FALSE]

Answer 2

Using data.table

library(data.table)
setDT(df)

df[Column %in% df[, .N, by = Column][N >= 60, Column]]

Answer 3

We create a frequency table and then subset the rows based on the 'count' of values in 'Column'

tbl <- table(df$Column) >=60
subset(df, Column %in% names(tbl)[tbl])

Or with ave from base R

df[with(df, ave(Column, Column, FUN = length)>=60),]

Or we use data.table

library(data.table)
setDT(df)[, .SD[.N >= 60], Column]

Or another option with data.table is .I

setDT(df)[df[, .I[.N >=60], Column]$V1]

Answer 4

There is also a variant to Eric Watt's answer which uses a join instead of %in% :

library(data.table)
setDT(df)
df[df[, .N, by = Column][N >= 60, .(Column)], on = "Column"]

Removing infrequent rows in a data frame

Question

4 answers

solution1
4 2017-09-18 14:57:55

solution2
3 2017-09-18 15:08:37

solution3
2 2017-09-18 14:55:59

solution4
0 2017-09-19 09:11:49

Removing infrequent rows in a data frame

Question

4 answers

solution1 4 2017-09-18 14:57:55

solution2 3 2017-09-18 15:08:37

solution3 2 2017-09-18 14:55:59

solution4 0 2017-09-19 09:11:49

solution1
4 2017-09-18 14:57:55

solution2
3 2017-09-18 15:08:37

solution3
2 2017-09-18 14:55:59

solution4
0 2017-09-19 09:11:49