I am having trouble understanding the output from dplyr's top_n function. Can anybody help?
n=10
df = data.frame(ref=sample(letters,n),score=rnorm(n))
require(dplyr)
print(dplyr::top_n(df,5,score))
print(df[order(df$score,decreasing = T)[1:5],])
The output from top_n
is not ordered according to score as I expected. Compare with using the order
function
ref score 1 i 0.71556494 2 p 0.04463846 3 v 0.37290990 4 g 1.53206194 5 f 0.86307107 ref score 7 g 1.53206194 10 f 0.86307107 1 i 0.71556494 6 v 0.37290990 4 p 0.04463846
The documentation I have read also implies the top_n
results should be ordered by the specified column, for example
https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
Both outputs are the same, but top_n
is not rearranging the rows.
You can get the same result as df[order(df$score,decreasing = T)[1:5],]
using arrange()
top_n(df, 5, score) %>% arrange(desc(score))
Flipping the ordering around, df[order(df$score,decreasing = F)[1:5],]
is equivalent to top_n(df, -5, score) %>% arrange(score)
.
My misunderstanding and expectation was due to my reading of the documentation linked to in the question and described in the comments. Despite some documentation claims, top_n
does not generated output ordered by wt
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.