I have a dataset as follows
df <- tibble(
X = c("1", "1", "3", "3", "5", "5", "6", "6", "6"),
Y = c("X", "Y", "X", "Z","Z" "Y", "Y", "X", "Z"))
and want to have this output, meaning I always want the two highest values of column X to be "filtered out", without filtering it on a specific value but rather as a general command.
output <-tibble(
X = c("6", "6", "6", "5", "5"),
Y = c("Z", "X", "Y", "Y", "Z"))
I tried top_n but it does not seem to work, when the top n value appear multiple times.
You can combine unique
, sort
and tail
to get the second largest number and use this to subset using >=
.
df[df$X >= tail(sort(unique(df$X)), 2)[1],]
# X Y
#5 5 Z
#6 5 Y
#7 6 Y
#8 6 X
#9 6 Z
Data
df <- data.frame(
X = c(1, 1, 3, 3, 5, 5, 6, 6, 6),
Y = c("X", "Y", "X", "Z", "Z", "Y", "Y", "X", "Z"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.