在 R 中，如何获得两个变量的所有组合的总和？

Question

我有一个包含学生成绩和科目的长数据集。 我想保留一个长数据集，但我想添加一个列，告诉我一个学生在他们的人文课程（英语和历史）和他们的 STEM 课程（生物学和数学）中有多少 F。 我也希望 Ds、Cs、Bs 和 As 也一样。

我知道我可以明确地说明这一点，但在未来，他们可能会有其他科目（比如将化学添加到 STEM）或完全不同的类别，比如外语，所以我希望它是可扩展的。

我知道如何获得列的所有组合，我知道如何手动处理每个部分——但我不知道如何将两者结合起来。 任何帮助将不胜感激！

#Sample data
library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                      subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                      grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))

#All combinations of grades and subjects
all_subject_combos <- c("eng|his", "bio|math")
all_grades <- c("F", "D", "C", 
             "B", "A")

subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)

all_combos <- subjects_and_letter_grades %>%
  unite("names", c(Var1, Var2)) %>%
  mutate(names = str_replace_all(names, "\\|", "_")) %>%
  pull(names)

#Manual generation of numbers of Fs by subject
#This is what I want the results to look like, but with all other letter grades

student_grades %>%
  group_by(student_id) %>%
  mutate(eng_his_F = sum((case_when(
    str_detect(subject, "eng|his") & grade == 1 ~ 1, 
    TRUE ~ 0)), na.rm = TRUE),
bio_math_F = sum((case_when(
  str_detect(subject, "bio|math") & grade == 1 ~ 1, 
  TRUE ~ 0)), na.rm = TRUE)) %>%
ungroup()

理想情况下，这对于任意数量的主题组合都是可扩展的，并且不需要我为 Ds、Cs、Bs 和 As 编写相同的代码。 谢谢！

Answer 1

我们可以使用map all_combos向量，然后在每个list ，按 'student_id' 进行分组（也可以在循环外执行此操作并创建一个 object 以在此处使用它），创建与循环名称相同的新列通过评估 ( !! ) 并对 case_when 中的case_when的sum使用:=运算符，并将数据与原始数据绑定

library(dplyr)
library(purrr)
library(stringr)
map_dfc(all_combos, ~ student_grades %>% 
  group_by(student_id) %>%
  transmute(!! .x := sum(case_when(str_detect(subject,
   str_replace(.x, "(\\w+)_(\\w+)_.", "\\1|\\2")) &
    grade == match(str_extract(.x, ".$"), all_grades)~ 1, TRUE ~ 0))) %>%
  ungroup %>% 
  dplyr::select(-student_id)) %>%
  bind_cols(student_grades, .)

-输出

# A tibble: 18 × 13
   student_id subject grade eng_his_F bio_math_F eng_his_D bio_math_D eng_his_C bio_math_C eng_his_B bio_math_B eng_hi…¹ bio_m…²
        <dbl> <chr>   <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>    <dbl>   <dbl>
 1          1 english     1         1          0         0          1         0          1         1          0        0       0
 2          1 biology     2         1          0         0          1         0          1         1          0        0       0
 3          1 math        3         1          0         0          1         0          1         1          0        0       0
 4          1 history     4         1          0         0          1         0          1         1          0        0       0
 5          2 english     5         0          0         1          0         0          1         0          1        1       0
 6          2 biology     4         0          0         1          0         0          1         0          1        1       0
 7          2 math        3         0          0         1          0         0          1         0          1        1       0
 8          2 history     2         0          0         1          0         0          1         0          1        1       0
 9          3 english     2         1          1         1          0         0          0         0          1        0       0
10          3 biology     4         1          1         1          0         0          0         0          1        0       0
11          3 math        1         1          1         1          0         0          0         0          1        0       0
12          3 history     1         1          1         1          0         0          0         0          1        0       0
13          4 english     1         1          1         0          1         1          0         0          0        0       0
14          4 biology     1         1          1         0          1         1          0         0          0        0       0
15          4 math        2         1          1         0          1         1          0         0          0        0       0
16          4 history     3         1          1         0          1         1          0         0          0        0       0
17          5 <NA>        3         0          0         0          0         0          0         0          1        0       0
18          5 biology     4         0          0         0          0         0          0         0          1        0       0
# … with abbreviated variable names ¹eng_his_A, ²bio_math_A

Answer 2

这是另一种看待它的方式。 我使用一个小型映射表 (subject_to_field) 将主题映射到它的领域（英语 -> 人文学科、数学 -> STEM 等）。 我认为这可能有助于可扩展性。 您需要在添加或删除主题时维护此表。

然后 left_join 将该字段与 student_grades tibble 结合起来。

不需要添加“grade2”列，但可以提高可读性。 最后，我们需要做的就是进行适当的分组和计数。 在这种方法中，对于学生未出现的成绩，您不会得到零计数。

library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                         subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                         grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))

student_grades <- student_grades %>%
  mutate(grade2 = case_when(
    grade == 1 ~ "A",
    grade == 2 ~ "B", 
    grade == 3 ~ "C", 
    grade == 4 ~ "D", 
    grade == 5 ~ "F"))

subject_to_field <- tibble(
  subject = c("biology", "english", "history", "math"),
  field = c("STEM", "Humanities", "Humanities", "STEM")
)

student_grades <- student_grades %>%
  left_join(subject_to_field, by = c("subject" = "subject"))


student_summary <- student_grades %>%
  group_by(student_id, field, subject, grade2) %>%
  summarise(count = n())

这会给你这个 output：

> student_summary
# A tibble: 18 × 5
# Groups:   student_id, field, subject [18]
   student_id field      subject grade2 count
        <dbl> <chr>      <chr>   <chr>  <int>
 1          1 Humanities english A          1
 2          1 Humanities history D          1
 3          1 STEM       biology B          1
 4          1 STEM       math    C          1
 5          2 Humanities english F          1
 6          2 Humanities history B          1
 7          2 STEM       biology D          1
 8          2 STEM       math    C          1
 9          3 Humanities english B          1
10          3 Humanities history A          1
11          3 STEM       biology D          1
12          3 STEM       math    A          1
13          4 Humanities english A          1
14          4 Humanities history C          1
15          4 STEM       biology A          1
16          4 STEM       math    B          1
17          5 STEM       biology D          1
18          5 NA         NA      C          1

在 R 中，如何获得两个变量的所有组合的总和？

问题描述

2 个解决方案

解决方案1
2 已采纳 2023-01-25 00:03:28

解决方案2
0 2023-01-25 00:23:02

在 R 中，如何获得两个变量的所有组合的总和？

问题描述

2 个解决方案

解决方案1 2 已采纳 2023-01-25 00:03:28

解决方案2 0 2023-01-25 00:23:02

解决方案1
2 已采纳 2023-01-25 00:03:28

解决方案2
0 2023-01-25 00:23:02