R: returning the 5 rows with the highest values

Question

Sample data

mysample <- data.frame(ID = 1:100, kWh = rnorm(100))

I'm trying to automate the process of returning the rows in a data frame that contain the 5 highest values in a certain column. In the sample data, the 5 highest values in the "kWh" column can be found using the code:

(tail(sort(mysample$kWh), 5))

which in my case returns:

[1] 1.477391 1.765312 1.778396 2.686136 2.710494

I would like to create a table that contains rows that contain these numbers in column 2. I am attempting to use this code:

mysample[mysample$kWh == (tail(sort(mysample$kWh), 5)),]

This returns:

   ID      kWh  
87 87 1.765312

I would like it to return the r rows that contain the figures above in the "kWh" column. I'm sure I've missed something basic but I can't figure it out.

Answer 1

We can use rank

mysample$Rank <- rank(-mysample$kWh)
head(mysample[order(mysample$Rank),],5)

if we don't need to create column, directly use order (as @Jaap mentioned in three alternative methods)

#order descending and get the first 5 rows
head(mysample[order(-mysample$kWh),],5)
#order ascending and get the last 5 rows
tail(mysample[order(mysample$kWh),],5) 
#or just use sequence as index to get the rows.
mysample[order(-mysample$kWh),][1:5]

R: returning the 5 rows with the highest values

Question

1 answers

solution1
6 ACCPTED 2016-02-01 15:40:36

R: returning the 5 rows with the highest values

Question

1 answers

solution1 6 ACCPTED 2016-02-01 15:40:36

solution1
6 ACCPTED 2016-02-01 15:40:36