[英]User defined function using mutate & case_when
I have school level data showing the percent of students within each racial group (ex black students/ total students).我有学校级别的数据显示每个种族群体中的学生百分比(前黑人学生/总学生数)。
My sample data is as follows:我的样本数据如下:
School Race perc_race
1 EnrollBlack 3
2 EnrollBlack 67
3 EnrollWhite 4
4 EnrollWhite 8
5 EnrollHis 55
6 EnrollHis 88
7 EnrollAsian 43
8 EnrollAsian 34
I am trying to create one dummy variable, for each race, showing which tercile a school falls into.我试图为每个种族创建一个虚拟变量,显示一所学校属于哪个三分位数。 Example if a school has 20% black students, the value for black would be 1, because that school fall into the 1st tercile.例如,如果一所学校有 20% 的黑人学生,则黑人的值为 1,因为该学校属于第一个三分位数。 If a school has 67% black, then they fall into the 3rd tercile and will have "3" in the black column.如果一所学校有 67% 的黑人,那么他们就属于第三个三分位数,并且在黑色栏中会有“3”。
School Race Percent_race black white hisp asian
1 EnrollBlack 3 1
2 EnrollBlack 67 3
3 EnrollWhite 4 1
4 EnrollWhite 8 1
5 EnrollHis 55 2
6 EnrollHis 88 3
7 EnrollAsian 43 2
8 EnrollAsian 3 4 2
I can repeat this block of code for each of the races I have in my dataset, but by replacing the race accordingly (ie "EnrollWhite", "EnrollHis"...)我可以为数据集中的每个种族重复此代码块,但通过相应地替换种族(即“EnrollWhite”、“EnrollHis”...)
mutate(black = case_when(race=='EnrollBlack' & perc_race>66.66 ~"3",
race=='EnrollBlack' & perc_race>33.33 ~"2",
race=='EnrollBlack' & perc_race<=33.33 ~"1"))
Instead of copy pasting this 5 times, I was trying to come up with a user -defined function such as this.我没有复制粘贴这 5 次,而是试图想出一个用户定义的函数,例如这样。
def_tercile <- function(x,y){
mutate(y = case_when(race=='x' & perc_race>66.66 ~"3",
race=='x' & perc_race>33.33 ~"2",
race=='x' & perc_race<=33.33 ~"1"))
}
Where data %>% def_tercile(EnrollWhite, White) will return a new column that defines the "white" terciles the school falls into.其中 data %>% def_tercile(EnrollWhite, White) 将返回一个新列,该列定义了学校所属的“白色”terciles。
I'm not sure if dplyr can be used within a function this way (it keeps throwing an error when I run the function).我不确定 dplyr 是否可以以这种方式在函数中使用(当我运行该函数时它不断抛出错误)。 Any thoughts on how I should approach this?关于我应该如何解决这个问题的任何想法?
library("tidyverse")
df <- read_table2("School Race perc_race
1 EnrollBlack 3
2 EnrollBlack 67
3 EnrollWhite 4
4 EnrollWhite 8
5 EnrollHis 55
6 EnrollHis 88
7 EnrollAsian 43
8 EnrollAsian 34")
To get the tercile, we can just divide by 33.33
and add 1
.为了得到三分位数,我们可以除以33.33
并加上1
。
df %>%
group_by(Race) %>%
mutate(
tercile = 1 + perc_race %/% (100/3)
)
#> # A tibble: 8 x 4
#> # Groups: Race [4]
#> School Race perc_race tercile
#> <dbl> <chr> <dbl> <dbl>
#> 1 1 EnrollBlack 3 1
#> 2 2 EnrollBlack 67 3
#> 3 3 EnrollWhite 4 1
#> 4 4 EnrollWhite 8 1
#> 5 5 EnrollHis 55 2
#> 6 6 EnrollHis 88 3
#> 7 7 EnrollAsian 43 2
#> 8 8 EnrollAsian 34 2
We can then use pivot_wider
to give them their own columns.然后我们可以使用pivot_wider
为它们提供自己的列。
df %>%
group_by(Race) %>%
mutate(
tercile = 1 + perc_race %/% (100/3),
simple_race = Race %>% str_replace("Enroll", "") %>% str_to_lower()
) %>%
pivot_wider(names_from = simple_race, values_from = tercile)
#> # A tibble: 8 x 7
#> # Groups: Race [4]
#> School Race perc_race black white his asian
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 EnrollBlack 3 1 NA NA NA
#> 2 2 EnrollBlack 67 3 NA NA NA
#> 3 3 EnrollWhite 4 NA 1 NA NA
#> 4 4 EnrollWhite 8 NA 1 NA NA
#> 5 5 EnrollHis 55 NA NA 2 NA
#> 6 6 EnrollHis 88 NA NA 3 NA
#> 7 7 EnrollAsian 43 NA NA NA 2
#> 8 8 EnrollAsian 34 NA NA NA 2
To answer your question about dplyr
functions, the function you wanted to define can be written like this.回答你关于dplyr
函数的问题,你想定义的函数可以这样写。 For the function to handle race_name
as a column name, we need to use !!
对于将race_name
作为列名处理的函数,我们需要使用!!
and :=
syntax.和:=
语法。
def_tercile <- function(data, race_value, race_name) {
mutate(data,
!!race_name := case_when(
Race == race_value & perc_race > 66.66 ~ "3",
Race == race_value & perc_race > 33.33 ~"2",
Race == race_value & perc_race <= 33.33 ~"1")
)
}
df %>%
def_tercile("EnrollBlack", "black") %>%
def_tercile("EnrollWhite", "white") %>%
def_tercile("EnrollHis", "his") %>%
def_tercile("EnrollAsian", "asian")
#> # A tibble: 8 x 7
#> School Race perc_race black white his asian
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 EnrollBlack 3 1 NA NA NA
#> 2 2 EnrollBlack 67 3 NA NA NA
#> 3 3 EnrollWhite 4 NA 1 NA NA
#> 4 4 EnrollWhite 8 NA 1 NA NA
#> 5 5 EnrollHis 55 NA NA 2 NA
#> 6 6 EnrollHis 88 NA NA 3 NA
#> 7 7 EnrollAsian 43 NA NA NA 2
#> 8 8 EnrollAsian 34 NA NA NA 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.