简体   繁体   中英

Indexing subgroups by sorted positions in R dataframe

I have a dataframe which contains information about several categories, and some associated variables. It is of the form:

   ID       category     sales    score
   227          A         109       21
   131          A         410       24
   131          A         509        1
   123          B           2       61
   545          B          19        5
   234          C         439      328
   654          C         765       41

What I would like to do is be able to introduce two new columns, salesRank and scoreRank , where I find the item index per category, had they been ordered by sales and score , respectively. I can solve the general case like this:

dF <- dF[order(-dF$sales),]
dF$salesRank<-seq.int(nrow(dF))

but this doesn't account for the categories and so far I've only solved this by breaking up the dataframe. What I want would result in the following:

   ID       category    sales    score    salesRank    scoreRank
   227          A         109       21         3            2
   131          A         410       24         2            1
   131          A         509        1         1            3
   123          B           2       61         2            1
   545          B          19        5         1            2
   234          C         439      328         2            1
   654          C         765       41         1            2

Many thanks!

Try:

library(dplyr)
df %>% 
  group_by(category) %>% 
  mutate(salesRank = row_number(desc(sales)),
         scoreRank = row_number(desc(score)))

Which gives:

#Source: local data frame [7 x 6]
#Groups: category
# 
#   ID category sales score salesRank scoreRank
#1 227        A   109    21         3         2
#2 131        A   410    24         2         1
#3 131        A   509     1         1         3
#4 123        B     2    61         2         1
#5 545        B    19     5         1         2
#6 234        C   439   328         2         1
#7 654        C   765    41         1         2

From the help:

  • row_number() : equivalent to rank(ties.method = "first")
  • min_rank() : equivalent to rank(ties.method = "min")
  • desc() : transform a vector into a format that will be sorted in descending order.

As @thelatemail pointed out, for this particular dataset you might want to use min_rank() instead of row_number() which will account for ties in sales/score more appropriately:

> row_number(c(1,2,2,4))
#[1] 1 2 3 4
> min_rank(c(1,2,2,4))
#[1] 1 2 2 4

Use ave in base R with rank (the - is to reverse the rankings from low-to-high to high-to-low):

dF$salesRank <- with(dF, ave(-sales, category, FUN=rank) )
#[1] 3 2 1 2 1 2 1
dF$scoreRank <- with(dF, ave(-score, category, FUN=rank) )
#[1] 2 1 3 1 2 1 2

I have just a base R solution with tapply .

salesRank <- tapply(dat$sales, dat$category, order, decreasing = T)
scoreRank <- tapply(dat$score, dat$category, order, decreasing = T)

cbind(dat, salesRank = unlist(salesRank), scoreRank= unlist(scoreRank))

    ID category sales score salesRank scoreRank
A1 227        A   109    21         3         2
A2 131        A   410    24         2         1
A3 131        A   509     1         1         3
B1 123        B     2    61         2         1
B2 545        B    19     5         1         2
C1 234        C   439   328         2         1
C2 654        C   765    41         1         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM