简体   繁体   中英

Sorting rows in dataframe based on custom identifier

I have a dataframe of student name and code identifiers ( P and V )

StudName     Code     ID
John           P      a1
Sam            V      a2
John           V      a3
Alex           P      a4
Sam            P      a5
Alex           V      a6
Stuart         P      a7
John           V      a8

What I want to do now is to rearrange the rows for each student , in which the Code determines the priority . (V -> P). Meaning, for each student, if they have a code of V , then the row will be above that of row with P .

For example, John has both code P and V, and since V has priority over P , it will be placed above P hence:

StudName     Code      ID
John         V         a3
John         V         a8
John         P         a1

The resulting dataframe would be:

StudName     Code      ID
 John         V        a3
 John         V        a8
 John         P        a1
 Sam          V        a2
 Sam          P        a5
 Alex         V        a6
 Alex         P        a4
 Stuart       P        a7

Hence, for each student, if they have a code V, then V will always be arranged first, followed by P.

Would appreciate some help on this.

Edit Updated example: Student that has more than one Ps or Vs

Here is a base R solution

dfout <- df[order(factor(df$StudName, levels = unique(df$StudName)),
                  factor(df$Code, levels = c("V","P"))),]

such that

> dfout
  StudName Code ID
3     John    V a3
8     John    V a8
1     John    P a1
2      Sam    V a2
5      Sam    P a5
6     Alex    V a6
4     Alex    P a4
7   Stuart    P a7

or

dfout <- do.call(rbind,
                 c(make.row.names = F,
                   lapply(split(df,factor(df$StudName, levels = unique(df$StudName))), 
                          function(x) x[order(x$Code,decreasing = TRUE),])))

such that

> dfout
  StudName Code ID
1     John    V a3
2     John    V a8
3     John    P a1
4      Sam    V a2
5      Sam    P a5
6     Alex    V a6
7     Alex    P a4
8   Stuart    P a7

DATA

df <- structure(list(StudName = c("John", "Sam", "John", "Alex", "Sam", 
"Alex", "Stuart", "John"), Code = c("P", "V", "V", "P", "P", 
"V", "P", "V"), ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", 
"a8")), class = "data.frame", row.names = c(NA, -8L))

Here is an option with dplyr

library(dplyr)
df %>% 
    arrange(factor(StudName, levels = unique(StudName)), 
      Code != 'V')
#  StudName Code ID
#1     John    V a3
#2     John    V a8
#3     John    P a1
#4      Sam    V a2
#5      Sam    P a5
#6     Alex    V a6
#7     Alex    P a4
#8   Stuart    P a7

data

df <- structure(list(StudName = c("John", "Sam", "John", "Alex", "Sam", 
"Alex", "Stuart", "John"), Code = c("P", "V", "V", "P", "P", 
"V", "P", "V"), ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", 
"a8")), class = "data.frame", row.names = c(NA, -8L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM