简体   繁体   中英

Subset the highest Values of every Group in R

Data:

ID<-c(1,2,3,4,5,6,7,8)
Value<-c(5,4,7,2,6,3,9,4)
Group<-c(1,1,1,2,3,2,2,3)
data<-data.frame(ID,Value,Group)
I would like to take the  2 of every Group with the highest Values into a new DataFrame.

The Final Result should look like this: ID<-1,3,6,7,5,8 Value<-5,7,3,9,6,4 Group<-1,1,2,2,3,3 Finaldata<-(ID,Value,Group)

My approach is:

Finaldata<-head(data[order(Value,decreasing=TRUE),],n=2) 

but I'm having issues to include that it should do it for every Group and not just for the Overall highest Values.

With "data.table" you can try something like this:

library(data.table)
as.data.table(data)[order(Group, -Value), head(.SD, 2), by = Group]
#    Group ID Value
# 1:     1  3     7
# 2:     1  1     5
# 3:     2  7     9
# 4:     2  6     3
# 5:     3  5     6
# 6:     3  8     4

using dplyr . If you are using dplyr_0.3 ie. the devel version, slice is available, otherwise, you could use do . You can install the devel version by:

devtools::install_github("hadley/dplyr") #first you need to install `devtools`.  

Also, you can check the link https://github.com/hadley/dplyr

library(dplyr) 
data%>% 
    group_by(Group) %>%
    arrange(desc(Value)) %>%
    slice(1:2) # do(head(.,2)) #in dplyr 0.2

gives the result

#   ID Value Group
#1  3     7     1
#2  1     5     1
#3  7     9     2
#4  6     3     2
#5  5     6     3
#6  8     4     3

By using slice , you can get the 2nd highest value (ie slice(2) ) for each group or from any starting row to any end row which the dataset actually have. In this example (slice(2:3) gives 1 row for group 3 as there were only 2 rows in that group.

or using base R

data[with(data, ave(-Value, Group, FUN=rank)%in% 1:2),]
#  ID Value Group
#1  1     5     1
#3  3     7     1
#5  5     6     3
#6  6     3     2
#7  7     9     2
#8  8     4     3

Try:

ll = lapply(split(data, Group), function(x) tail(x[order(x$Value),],2) )
ll
$`1`
  ID Value Group
1  1     5     1
3  3     7     1

$`2`
  ID Value Group
6  6     3     2
7  7     9     2

$`3`
  ID Value Group
8  8     4     3
5  5     6     3

To bind to a data frame:

do.call(rbind, ll) 
    ID Value Group
1.1  1     5     1
1.3  3     7     1
2.6  6     3     2
2.7  7     9     2
3.8  8     4     3
3.5  5     6     3

or:

rbindlist(ll)
   ID Value Group
1:  1     5     1
2:  3     7     1
3:  6     3     2
4:  7     9     2
5:  8     4     3
6:  5     6     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM