繁体   English   中英

在 R 中,如何获得两个变量的所有组合的总和?

[英]In R, how can I get the sum for all combinations of two variables?

我有一个包含学生成绩和科目的长数据集。 我想保留一个长数据集,但我想添加一个列,告诉我一个学生在他们的人文课程(英语和历史)和他们的 STEM 课程(生物学和数学)中有多少 F。 我也希望 Ds、Cs、Bs 和 As 也一样。

我知道我可以明确地说明这一点,但在未来,他们可能会有其他科目(比如将化学添加到 STEM)或完全不同的类别,比如外语,所以我希望它是可扩展的。

我知道如何获得列的所有组合,我知道如何手动处理每个部分——但我不知道如何将两者结合起来。 任何帮助将不胜感激!

#Sample data
library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                      subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                      grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))
#All combinations of grades and subjects
all_subject_combos <- c("eng|his", "bio|math")
all_grades <- c("F", "D", "C", 
             "B", "A")

subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)

all_combos <- subjects_and_letter_grades %>%
  unite("names", c(Var1, Var2)) %>%
  mutate(names = str_replace_all(names, "\\|", "_")) %>%
  pull(names)
#Manual generation of numbers of Fs by subject
#This is what I want the results to look like, but with all other letter grades

student_grades %>%
  group_by(student_id) %>%
  mutate(eng_his_F = sum((case_when(
    str_detect(subject, "eng|his") & grade == 1 ~ 1, 
    TRUE ~ 0)), na.rm = TRUE),
bio_math_F = sum((case_when(
  str_detect(subject, "bio|math") & grade == 1 ~ 1, 
  TRUE ~ 0)), na.rm = TRUE)) %>%
ungroup()

理想情况下,这对于任意数量的主题组合都是可扩展的,并且不需要我为 Ds、Cs、Bs 和 As 编写相同的代码。 谢谢!

我们可以使用map all_combos向量,然后在每个list ,按 'student_id' 进行分组(也可以在循环外执行此操作并创建一个 object 以在此处使用它),创建与循环名称相同的新列通过评估 ( !! ) 并对 case_when 中的case_whensum使用:=运算符,并将数据与原始数据绑定

library(dplyr)
library(purrr)
library(stringr)
map_dfc(all_combos, ~ student_grades %>% 
  group_by(student_id) %>%
  transmute(!! .x := sum(case_when(str_detect(subject,
   str_replace(.x, "(\\w+)_(\\w+)_.", "\\1|\\2")) &
    grade == match(str_extract(.x, ".$"), all_grades)~ 1, TRUE ~ 0))) %>%
  ungroup %>% 
  dplyr::select(-student_id)) %>%
  bind_cols(student_grades, .)

-输出

# A tibble: 18 × 13
   student_id subject grade eng_his_F bio_math_F eng_his_D bio_math_D eng_his_C bio_math_C eng_his_B bio_math_B eng_hi…¹ bio_m…²
        <dbl> <chr>   <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>    <dbl>   <dbl>
 1          1 english     1         1          0         0          1         0          1         1          0        0       0
 2          1 biology     2         1          0         0          1         0          1         1          0        0       0
 3          1 math        3         1          0         0          1         0          1         1          0        0       0
 4          1 history     4         1          0         0          1         0          1         1          0        0       0
 5          2 english     5         0          0         1          0         0          1         0          1        1       0
 6          2 biology     4         0          0         1          0         0          1         0          1        1       0
 7          2 math        3         0          0         1          0         0          1         0          1        1       0
 8          2 history     2         0          0         1          0         0          1         0          1        1       0
 9          3 english     2         1          1         1          0         0          0         0          1        0       0
10          3 biology     4         1          1         1          0         0          0         0          1        0       0
11          3 math        1         1          1         1          0         0          0         0          1        0       0
12          3 history     1         1          1         1          0         0          0         0          1        0       0
13          4 english     1         1          1         0          1         1          0         0          0        0       0
14          4 biology     1         1          1         0          1         1          0         0          0        0       0
15          4 math        2         1          1         0          1         1          0         0          0        0       0
16          4 history     3         1          1         0          1         1          0         0          0        0       0
17          5 <NA>        3         0          0         0          0         0          0         0          1        0       0
18          5 biology     4         0          0         0          0         0          0         0          1        0       0
# … with abbreviated variable names ¹​eng_his_A, ²​bio_math_A

这是另一种看待它的方式。 我使用一个小型映射表 (subject_to_field) 将主题映射到它的领域(英语 -> 人文学科、数学 -> STEM 等)。 我认为这可能有助于可扩展性。 您需要在添加或删除主题时维护此表。

然后 left_join 将该字段与 student_grades tibble 结合起来。

不需要添加“grade2”列,但可以提高可读性。 最后,我们需要做的就是进行适当的分组和计数。 在这种方法中,对于学生未出现的成绩,您不会得到零计数。

library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                         subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                         grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))

student_grades <- student_grades %>%
  mutate(grade2 = case_when(
    grade == 1 ~ "A",
    grade == 2 ~ "B", 
    grade == 3 ~ "C", 
    grade == 4 ~ "D", 
    grade == 5 ~ "F"))

subject_to_field <- tibble(
  subject = c("biology", "english", "history", "math"),
  field = c("STEM", "Humanities", "Humanities", "STEM")
)

student_grades <- student_grades %>%
  left_join(subject_to_field, by = c("subject" = "subject"))


student_summary <- student_grades %>%
  group_by(student_id, field, subject, grade2) %>%
  summarise(count = n())

这会给你这个 output:

> student_summary
# A tibble: 18 × 5
# Groups:   student_id, field, subject [18]
   student_id field      subject grade2 count
        <dbl> <chr>      <chr>   <chr>  <int>
 1          1 Humanities english A          1
 2          1 Humanities history D          1
 3          1 STEM       biology B          1
 4          1 STEM       math    C          1
 5          2 Humanities english F          1
 6          2 Humanities history B          1
 7          2 STEM       biology D          1
 8          2 STEM       math    C          1
 9          3 Humanities english B          1
10          3 Humanities history A          1
11          3 STEM       biology D          1
12          3 STEM       math    A          1
13          4 Humanities english A          1
14          4 Humanities history C          1
15          4 STEM       biology A          1
16          4 STEM       math    B          1
17          5 STEM       biology D          1
18          5 NA         NA      C          1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM