I would like to choose best 2 results of quiz exams (highest score and highest attendance) for each student and eliminate the weakest quiz over 3 quiz exams. We might say that I would like to choose best 2 columns from 3 columns for each row. Then create a new data frame has StudentID, ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal
. I can handle it with looping through the table which is too inefficient in R I assume that. What is the efficient way to handle the issue with dplyr package?
Minimalist data
The pseudo data frame is placed at the bottom. " G
" means the student has not attended the exam so I would like to keep that value instead of replacing it into the 0. For instance, if he got this scenario with G ( ExamQuiz1
), 0 ( ExamQuiz2
), 10 ( ExamQuiz3
), I have to choose 0 as ExamQuiz1
and 10 as ExamQuiz2
for quiz inputs. Because 0 is better than G
because of attendance side. If there is a result (with numbers), it means that student has already attended. Every single cell under the columns of ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal
might have numeric (exam result) or character value (" G
" > not attended). I will not touch any values of ExamMidterm and ExamFinal columns. The main idea is only related with the columns of ExamQuiz1, ExamQuiz2, and ExamQuiz3
.
StudentID ExamQuiz1 ExamQuiz2 ExamQuiz3 ExamMidterm ExamFinal
1 11111 0 G G G G
2 22222 0 G 43 71 18
3 33333 0 G G G G
4 44444 0 G G G G
5 55555 60 38 G 64 27
6 66666 0 G G G G
Edit : Still some of commenters constantly point that the data is not tidy. As I explained on the comments, the reason for that or what you are offering to tidy it up do not make sense on my side. For that reason, I placed more explanations on the question body without changing the structure of the data.
A base R solution
cbind(df[-(2:4)], t(apply(df[2:4], 1, function(x){
c(x[x == "G"], sort(x[x != "G"]))[-1]
})))
# StudentID Midterm Final 1 2
# 1 11111 G G G 0
# 2 22222 71 18 0 43
# 3 33333 G G G 0
# 4 44444 G G G 0
# 5 55555 64 27 38 60
# 6 66666 G G G 0
In your rule, G
should be put in front of any numerics. So at first I put all existing G
to the beginning of a vector and append sorted scores. After removing the first element in the vector, top 2 scores will remain.
Here's an approach with dplyr
's new across
(version 1.0.0
or higher):
Assuming no one can get a negative score and being absent is worse than getting zero, we can just set G
to be -1
.
library(dplyr)
data %>%
mutate(across(-StudentID, ~case_when(. == "G" ~ -1,
TRUE ~ as.numeric(.)))) %>%
rowwise() %>%
mutate(TopQuiz = max(c_across(starts_with("Quiz"))),
SecondQuiz = sort(c_across(starts_with("Quiz")),
decreasing = TRUE)[2]) %>%
dplyr::select(StudentID, TopQuiz, SecondQuiz, Midterm, Final) %>%
mutate(across(-StudentID, ~case_when(. == -1 ~ "G",
TRUE ~ as.character(.))))
##A tibble: 6 x 5
## Rowwise:
# StudentID TopQuiz SecondQuiz Midterm Final
# <int> <chr> <chr> <chr> <chr>
#1 11111 0 G G G
#2 22222 43 0 71 18
#3 33333 0 G G G
#4 44444 0 G G G
#5 55555 60 38 64 27
#6 66666 0 G G G
Slightly different way of applying dplyr
and stringr
by making G NA to do the math and then putting NA back to G and returning to character.
library(dplyr)
library(stringr)
newgrades <- grades %>%
mutate(across(starts_with("Quiz"), ~ str_replace(., "G", NA_character_))) %>%
mutate(across(starts_with("Quiz"), as.numeric)) %>%
rowwise() %>%
mutate(TopQuiz = max(c_across(starts_with("Quiz")), na.rm = TRUE),
NextBestQuiz = sort(c_across(starts_with("Quiz")),
decreasing = TRUE)[2]) %>%
mutate(across(ends_with("Quiz"), as.character)) %>%
mutate(across(ends_with("Quiz"), ~ str_replace_na(., replacement = "G"))) %>%
select(id, TopQuiz, NextBestQuiz, Midterm, Final)
newgrades
#> # A tibble: 6 x 5
#> # Rowwise:
#> id TopQuiz NextBestQuiz Midterm Final
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 0 G G G
#> 2 2 43 0 71 18
#> 3 3 0 G G G
#> 4 4 0 G G G
#> 5 5 60 38 64 27
#> 6 6 0 G G G
Your data
grades <- data.frame(
id = c(1:6),
Quiz1 = c("0","0","0","0","60","0"),
Quiz2 = c("G","G","G","G","38","G"),
Quiz3 = c("G","43","G","G","G","G"),
Midterm = c("G","71","G","G","64","G"),
Final = c("G","18","G","G","27","G")
)
`
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.