![](/img/trans.png)
[英]How to get the sum of combinations of variables of 2 columns in a tibble in r
[英]In R, how can I get the sum for all combinations of two variables?
我有一个包含学生成绩和科目的长数据集。 我想保留一个长数据集,但我想添加一个列,告诉我一个学生在他们的人文课程(英语和历史)和他们的 STEM 课程(生物学和数学)中有多少 F。 我也希望 Ds、Cs、Bs 和 As 也一样。
我知道我可以明确地说明这一点,但在未来,他们可能会有其他科目(比如将化学添加到 STEM)或完全不同的类别,比如外语,所以我希望它是可扩展的。
我知道如何获得列的所有组合,我知道如何手动处理每个部分——但我不知道如何将两者结合起来。 任何帮助将不胜感激!
#Sample data
library(tidyverse)
student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))
#All combinations of grades and subjects
all_subject_combos <- c("eng|his", "bio|math")
all_grades <- c("F", "D", "C",
"B", "A")
subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)
all_combos <- subjects_and_letter_grades %>%
unite("names", c(Var1, Var2)) %>%
mutate(names = str_replace_all(names, "\\|", "_")) %>%
pull(names)
#Manual generation of numbers of Fs by subject
#This is what I want the results to look like, but with all other letter grades
student_grades %>%
group_by(student_id) %>%
mutate(eng_his_F = sum((case_when(
str_detect(subject, "eng|his") & grade == 1 ~ 1,
TRUE ~ 0)), na.rm = TRUE),
bio_math_F = sum((case_when(
str_detect(subject, "bio|math") & grade == 1 ~ 1,
TRUE ~ 0)), na.rm = TRUE)) %>%
ungroup()
理想情况下,这对于任意数量的主题组合都是可扩展的,并且不需要我为 Ds、Cs、Bs 和 As 编写相同的代码。 谢谢!
我们可以使用map
all_combos
向量,然后在每个list
,按 'student_id' 进行分组(也可以在循环外执行此操作并创建一个 object 以在此处使用它),创建与循环名称相同的新列通过评估 ( !!
) 并对 case_when 中的case_when
的sum
使用:=
运算符,并将数据与原始数据绑定
library(dplyr)
library(purrr)
library(stringr)
map_dfc(all_combos, ~ student_grades %>%
group_by(student_id) %>%
transmute(!! .x := sum(case_when(str_detect(subject,
str_replace(.x, "(\\w+)_(\\w+)_.", "\\1|\\2")) &
grade == match(str_extract(.x, ".$"), all_grades)~ 1, TRUE ~ 0))) %>%
ungroup %>%
dplyr::select(-student_id)) %>%
bind_cols(student_grades, .)
-输出
# A tibble: 18 × 13
student_id subject grade eng_his_F bio_math_F eng_his_D bio_math_D eng_his_C bio_math_C eng_his_B bio_math_B eng_hi…¹ bio_m…²
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 english 1 1 0 0 1 0 1 1 0 0 0
2 1 biology 2 1 0 0 1 0 1 1 0 0 0
3 1 math 3 1 0 0 1 0 1 1 0 0 0
4 1 history 4 1 0 0 1 0 1 1 0 0 0
5 2 english 5 0 0 1 0 0 1 0 1 1 0
6 2 biology 4 0 0 1 0 0 1 0 1 1 0
7 2 math 3 0 0 1 0 0 1 0 1 1 0
8 2 history 2 0 0 1 0 0 1 0 1 1 0
9 3 english 2 1 1 1 0 0 0 0 1 0 0
10 3 biology 4 1 1 1 0 0 0 0 1 0 0
11 3 math 1 1 1 1 0 0 0 0 1 0 0
12 3 history 1 1 1 1 0 0 0 0 1 0 0
13 4 english 1 1 1 0 1 1 0 0 0 0 0
14 4 biology 1 1 1 0 1 1 0 0 0 0 0
15 4 math 2 1 1 0 1 1 0 0 0 0 0
16 4 history 3 1 1 0 1 1 0 0 0 0 0
17 5 <NA> 3 0 0 0 0 0 0 0 1 0 0
18 5 biology 4 0 0 0 0 0 0 0 1 0 0
# … with abbreviated variable names ¹eng_his_A, ²bio_math_A
这是另一种看待它的方式。 我使用一个小型映射表 (subject_to_field) 将主题映射到它的领域(英语 -> 人文学科、数学 -> STEM 等)。 我认为这可能有助于可扩展性。 您需要在添加或删除主题时维护此表。
然后 left_join 将该字段与 student_grades tibble 结合起来。
不需要添加“grade2”列,但可以提高可读性。 最后,我们需要做的就是进行适当的分组和计数。 在这种方法中,对于学生未出现的成绩,您不会得到零计数。
library(tidyverse)
student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))
student_grades <- student_grades %>%
mutate(grade2 = case_when(
grade == 1 ~ "A",
grade == 2 ~ "B",
grade == 3 ~ "C",
grade == 4 ~ "D",
grade == 5 ~ "F"))
subject_to_field <- tibble(
subject = c("biology", "english", "history", "math"),
field = c("STEM", "Humanities", "Humanities", "STEM")
)
student_grades <- student_grades %>%
left_join(subject_to_field, by = c("subject" = "subject"))
student_summary <- student_grades %>%
group_by(student_id, field, subject, grade2) %>%
summarise(count = n())
这会给你这个 output:
> student_summary
# A tibble: 18 × 5
# Groups: student_id, field, subject [18]
student_id field subject grade2 count
<dbl> <chr> <chr> <chr> <int>
1 1 Humanities english A 1
2 1 Humanities history D 1
3 1 STEM biology B 1
4 1 STEM math C 1
5 2 Humanities english F 1
6 2 Humanities history B 1
7 2 STEM biology D 1
8 2 STEM math C 1
9 3 Humanities english B 1
10 3 Humanities history A 1
11 3 STEM biology D 1
12 3 STEM math A 1
13 4 Humanities english A 1
14 4 Humanities history C 1
15 4 STEM biology A 1
16 4 STEM math B 1
17 5 STEM biology D 1
18 5 NA NA C 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.