使用 R dplyr package 根据其他列值操作列

Question

I would like to choose best 2 results of quiz exams (highest score and highest attendance) for each student and eliminate the weakest quiz over 3 quiz exams.我想为每个学生选择最好的 2 个测验考试结果（最高分和最高出勤率），并消除 3 个测验考试中最弱的测验。 We might say that I would like to choose best 2 columns from 3 columns for each row.我们可能会说，我想从每行的 3 列中选择最好的 2 列。 Then create a new data frame has StudentID, ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal .然后创建一个包含StudentID, ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal的新数据框。 I can handle it with looping through the table which is too inefficient in R I assume that.我可以通过循环遍历表格来处理它，这在 R 中效率太低我假设。 What is the efficient way to handle the issue with dplyr package?处理 dplyr package 问题的有效方法是什么？

Minimalist data极简数据

The pseudo data frame is placed at the bottom.伪数据框放置在底部。 " G " means the student has not attended the exam so I would like to keep that value instead of replacing it into the 0. For instance, if he got this scenario with G ( ExamQuiz1 ), 0 ( ExamQuiz2 ), 10 ( ExamQuiz3 ), I have to choose 0 as ExamQuiz1 and 10 as ExamQuiz2 for quiz inputs. “ G ”表示学生没有参加考试，所以我想保留该值而不是将其替换为 0。例如，如果他使用 G ( ExamQuiz1 )、0 ( ExamQuiz2 )、10 ( ExamQuiz3 ) 得到这个场景，我必须选择 0 作为ExamQuiz1和 10 作为ExamQuiz2用于测验输入。 Because 0 is better than G because of attendance side.因为出勤方面0比G好。 If there is a result (with numbers), it means that student has already attended.如果有结果（带数字），则表示学生已经参加。 Every single cell under the columns of ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal might have numeric (exam result) or character value (" G " > not attended). ExamQuiz1, ExamQuiz2, ExamMidterm and ExamFinal列下的每个单元格都可能具有数字（考试结果）或字符值（“ G ”> 未参加）。 I will not touch any values of ExamMidterm and ExamFinal columns.我不会触及 ExamMidterm 和 ExamFinal 列的任何值。 The main idea is only related with the columns of ExamQuiz1, ExamQuiz2, and ExamQuiz3 .主要思想仅与ExamQuiz1, ExamQuiz2, and ExamQuiz3的列相关。

   StudentID  ExamQuiz1  ExamQuiz2  ExamQuiz3  ExamMidterm  ExamFinal
1      11111          0          G          G            G          G
2      22222          0          G         43           71         18
3      33333          0          G          G            G          G
4      44444          0          G          G            G          G
5      55555         60         38          G           64         27
6      66666          0          G          G            G          G

Edit : Still some of commenters constantly point that the data is not tidy.编辑：仍然有一些评论者不断指出数据不整洁。 As I explained on the comments, the reason for that or what you are offering to tidy it up do not make sense on my side.正如我在评论中解释的那样，这样做的原因或您提供的整理方法对我来说没有意义。 For that reason, I placed more explanations on the question body without changing the structure of the data.出于这个原因，我在问题主体上放置了更多的解释，而不改变数据的结构。

Answer 1

A base R solution一基R解决方案

cbind(df[-(2:4)], t(apply(df[2:4], 1, function(x){
  c(x[x == "G"], sort(x[x != "G"]))[-1]
})))

#   StudentID Midterm Final  1  2
# 1     11111       G     G  G  0
# 2     22222      71    18  0 43
# 3     33333       G     G  G  0
# 4     44444       G     G  G  0
# 5     55555      64    27 38 60
# 6     66666       G     G  G  0

In your rule, G should be put in front of any numerics.在您的规则中，应将G放在任何数字前面。 So at first I put all existing G to the beginning of a vector and append sorted scores.所以起初我把所有现有的G放在一个向量的开头，然后 append 排序分数。 After removing the first element in the vector, top 2 scores will remain.删除向量中的第一个元素后，将保留前 2 个分数。

Answer 2

Here's an approach with dplyr 's new across (version 1.0.0 or higher):这是dplyr across新方法（版本1.0.0或更高版本）：

Assuming no one can get a negative score and being absent is worse than getting zero, we can just set G to be -1 .假设没有人可以得到负分并且缺席比得到零更糟糕，我们可以将G设置为-1 。

library(dplyr)
data %>% 
  mutate(across(-StudentID, ~case_when(. == "G" ~ -1,
                                       TRUE ~ as.numeric(.)))) %>%
  rowwise() %>%
  mutate(TopQuiz = max(c_across(starts_with("Quiz"))),
         SecondQuiz = sort(c_across(starts_with("Quiz")),
                           decreasing = TRUE)[2]) %>%
  dplyr::select(StudentID, TopQuiz, SecondQuiz, Midterm, Final) %>%
  mutate(across(-StudentID, ~case_when(. == -1 ~ "G",
                                       TRUE ~ as.character(.))))
##A tibble: 6 x 5
## Rowwise: 
#  StudentID TopQuiz SecondQuiz Midterm Final
#      <int> <chr>   <chr>      <chr>   <chr>
#1     11111 0       G          G       G    
#2     22222 43      0          71      18   
#3     33333 0       G          G       G    
#4     44444 0       G          G       G    
#5     55555 60      38         64      27   
#6     66666 0       G          G       G

Answer 3

Slightly different way of applying dplyr and stringr by making G NA to do the math and then putting NA back to G and returning to character.应用dplyr和stringr的方式略有不同，方法是让 G NA 进行数学运算，然后将 NA 放回 G 并返回字符。

library(dplyr)
library(stringr)


newgrades <- grades %>% 
  mutate(across(starts_with("Quiz"), ~ str_replace(., "G", NA_character_))) %>%
  mutate(across(starts_with("Quiz"), as.numeric)) %>%
  rowwise() %>%
  mutate(TopQuiz = max(c_across(starts_with("Quiz")), na.rm = TRUE),
         NextBestQuiz = sort(c_across(starts_with("Quiz")),
                             decreasing = TRUE)[2]) %>%
  mutate(across(ends_with("Quiz"), as.character)) %>%
  mutate(across(ends_with("Quiz"), ~ str_replace_na(., replacement = "G"))) %>%
  select(id, TopQuiz, NextBestQuiz, Midterm, Final)

newgrades
#> # A tibble: 6 x 5
#> # Rowwise: 
#>      id TopQuiz NextBestQuiz Midterm Final
#>   <int> <chr>   <chr>        <chr>   <chr>
#> 1     1 0       G            G       G    
#> 2     2 43      0            71      18   
#> 3     3 0       G            G       G    
#> 4     4 0       G            G       G    
#> 5     5 60      38           64      27   
#> 6     6 0       G            G       G

Your data您的数据

grades <- data.frame(
  id = c(1:6),
  Quiz1 = c("0","0","0","0","60","0"),
  Quiz2 = c("G","G","G","G","38","G"),
  Quiz3 = c("G","43","G","G","G","G"),
  Midterm = c("G","71","G","G","64","G"),
  Final = c("G","18","G","G","27","G")
)
`

使用 R dplyr package 根据其他列值操作列

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-06-11 15:48:40

解决方案2
1 2020-06-11 15:39:54

解决方案3
0 2020-06-12 11:54:29

使用 R dplyr package 根据其他列值操作列

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-06-11 15:48:40

解决方案2 1 2020-06-11 15:39:54

解决方案3 0 2020-06-12 11:54:29

解决方案1
2 已采纳 2020-06-11 15:48:40

解决方案2
1 2020-06-11 15:39:54

解决方案3
0 2020-06-12 11:54:29