简体   繁体   中英

User defined function using mutate & case_when

I have school level data showing the percent of students within each racial group (ex black students/ total students).

My sample data is as follows:

School  Race    perc_race
1   EnrollBlack 3
2   EnrollBlack 67
3   EnrollWhite 4
4   EnrollWhite 8
5   EnrollHis   55
6   EnrollHis   88
7   EnrollAsian 43
8   EnrollAsian 34

I am trying to create one dummy variable, for each race, showing which tercile a school falls into. Example if a school has 20% black students, the value for black would be 1, because that school fall into the 1st tercile. If a school has 67% black, then they fall into the 3rd tercile and will have "3" in the black column.

School  Race    Percent_race    black   white   hisp    asian
1   EnrollBlack       3         1           
2   EnrollBlack       67        3           
3   EnrollWhite       4                    1        
4   EnrollWhite       8                    1        
5   EnrollHis         55                          2 
6   EnrollHis         88                          3 
7   EnrollAsian       43                                  2
8   EnrollAsian 3     4                                   2

I can repeat this block of code for each of the races I have in my dataset, but by replacing the race accordingly (ie "EnrollWhite", "EnrollHis"...)

  mutate(black = case_when(race=='EnrollBlack' & perc_race>66.66 ~"3",
                           race=='EnrollBlack' & perc_race>33.33 ~"2",
                           race=='EnrollBlack' & perc_race<=33.33 ~"1"))

Instead of copy pasting this 5 times, I was trying to come up with a user -defined function such as this.

  def_tercile <- function(x,y){
  mutate(y = case_when(race=='x' & perc_race>66.66 ~"3",
                           race=='x' & perc_race>33.33 ~"2",
                           race=='x' & perc_race<=33.33 ~"1"))
  }

Where data %>% def_tercile(EnrollWhite, White) will return a new column that defines the "white" terciles the school falls into.

I'm not sure if dplyr can be used within a function this way (it keeps throwing an error when I run the function). Any thoughts on how I should approach this?

library("tidyverse")

df <- read_table2("School  Race    perc_race
1   EnrollBlack 3
2   EnrollBlack 67
3   EnrollWhite 4
4   EnrollWhite 8
5   EnrollHis   55
6   EnrollHis   88
7   EnrollAsian 43
8   EnrollAsian 34")

To get the tercile, we can just divide by 33.33 and add 1 .

df %>%
  group_by(Race) %>%
  mutate(
    tercile = 1 + perc_race %/% (100/3)
  )
#> # A tibble: 8 x 4
#> # Groups:   Race [4]
#>   School Race        perc_race tercile
#>    <dbl> <chr>           <dbl>   <dbl>
#> 1      1 EnrollBlack         3       1
#> 2      2 EnrollBlack        67       3
#> 3      3 EnrollWhite         4       1
#> 4      4 EnrollWhite         8       1
#> 5      5 EnrollHis          55       2
#> 6      6 EnrollHis          88       3
#> 7      7 EnrollAsian        43       2
#> 8      8 EnrollAsian        34       2

We can then use pivot_wider to give them their own columns.

df %>%
  group_by(Race) %>%
  mutate(
    tercile = 1 + perc_race %/% (100/3),
    simple_race = Race %>% str_replace("Enroll", "") %>% str_to_lower()
  ) %>%
  pivot_wider(names_from = simple_race, values_from = tercile)
#> # A tibble: 8 x 7
#> # Groups:   Race [4]
#>   School Race        perc_race black white   his asian
#>    <dbl> <chr>           <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1      1 EnrollBlack         3     1    NA    NA    NA
#> 2      2 EnrollBlack        67     3    NA    NA    NA
#> 3      3 EnrollWhite         4    NA     1    NA    NA
#> 4      4 EnrollWhite         8    NA     1    NA    NA
#> 5      5 EnrollHis          55    NA    NA     2    NA
#> 6      6 EnrollHis          88    NA    NA     3    NA
#> 7      7 EnrollAsian        43    NA    NA    NA     2
#> 8      8 EnrollAsian        34    NA    NA    NA     2

To answer your question about dplyr functions, the function you wanted to define can be written like this. For the function to handle race_name as a column name, we need to use !! and := syntax.

def_tercile <- function(data, race_value, race_name) {
  mutate(data,
    !!race_name := case_when(
      Race == race_value & perc_race > 66.66 ~ "3",
      Race == race_value & perc_race > 33.33 ~"2",
      Race == race_value & perc_race <= 33.33 ~"1")
  )
}

df %>%
  def_tercile("EnrollBlack", "black") %>%
  def_tercile("EnrollWhite", "white") %>%
  def_tercile("EnrollHis", "his") %>%
  def_tercile("EnrollAsian", "asian")
#> # A tibble: 8 x 7
#>   School Race        perc_race black white his   asian
#>    <dbl> <chr>           <dbl> <chr> <chr> <chr> <chr>
#> 1      1 EnrollBlack         3 1     NA    NA    NA   
#> 2      2 EnrollBlack        67 3     NA    NA    NA   
#> 3      3 EnrollWhite         4 NA    1     NA    NA   
#> 4      4 EnrollWhite         8 NA    1     NA    NA   
#> 5      5 EnrollHis          55 NA    NA    2     NA   
#> 6      6 EnrollHis          88 NA    NA    3     NA   
#> 7      7 EnrollAsian        43 NA    NA    NA    2    
#> 8      8 EnrollAsian        34 NA    NA    NA    2  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM