[英]User defined function using mutate & case_when
我有學校級別的數據顯示每個種族群體中的學生百分比(前黑人學生/總學生數)。
我的樣本數據如下:
School Race perc_race
1 EnrollBlack 3
2 EnrollBlack 67
3 EnrollWhite 4
4 EnrollWhite 8
5 EnrollHis 55
6 EnrollHis 88
7 EnrollAsian 43
8 EnrollAsian 34
我試圖為每個種族創建一個虛擬變量,顯示一所學校屬於哪個三分位數。 例如,如果一所學校有 20% 的黑人學生,則黑人的值為 1,因為該學校屬於第一個三分位數。 如果一所學校有 67% 的黑人,那么他們就屬於第三個三分位數,並且在黑色欄中會有“3”。
School Race Percent_race black white hisp asian
1 EnrollBlack 3 1
2 EnrollBlack 67 3
3 EnrollWhite 4 1
4 EnrollWhite 8 1
5 EnrollHis 55 2
6 EnrollHis 88 3
7 EnrollAsian 43 2
8 EnrollAsian 3 4 2
我可以為數據集中的每個種族重復此代碼塊,但通過相應地替換種族(即“EnrollWhite”、“EnrollHis”...)
mutate(black = case_when(race=='EnrollBlack' & perc_race>66.66 ~"3",
race=='EnrollBlack' & perc_race>33.33 ~"2",
race=='EnrollBlack' & perc_race<=33.33 ~"1"))
我沒有復制粘貼這 5 次,而是試圖想出一個用戶定義的函數,例如這樣。
def_tercile <- function(x,y){
mutate(y = case_when(race=='x' & perc_race>66.66 ~"3",
race=='x' & perc_race>33.33 ~"2",
race=='x' & perc_race<=33.33 ~"1"))
}
其中 data %>% def_tercile(EnrollWhite, White) 將返回一個新列,該列定義了學校所屬的“白色”terciles。
我不確定 dplyr 是否可以以這種方式在函數中使用(當我運行該函數時它不斷拋出錯誤)。 關於我應該如何解決這個問題的任何想法?
library("tidyverse")
df <- read_table2("School Race perc_race
1 EnrollBlack 3
2 EnrollBlack 67
3 EnrollWhite 4
4 EnrollWhite 8
5 EnrollHis 55
6 EnrollHis 88
7 EnrollAsian 43
8 EnrollAsian 34")
為了得到三分位數,我們可以除以33.33
並加上1
。
df %>%
group_by(Race) %>%
mutate(
tercile = 1 + perc_race %/% (100/3)
)
#> # A tibble: 8 x 4
#> # Groups: Race [4]
#> School Race perc_race tercile
#> <dbl> <chr> <dbl> <dbl>
#> 1 1 EnrollBlack 3 1
#> 2 2 EnrollBlack 67 3
#> 3 3 EnrollWhite 4 1
#> 4 4 EnrollWhite 8 1
#> 5 5 EnrollHis 55 2
#> 6 6 EnrollHis 88 3
#> 7 7 EnrollAsian 43 2
#> 8 8 EnrollAsian 34 2
然后我們可以使用pivot_wider
為它們提供自己的列。
df %>%
group_by(Race) %>%
mutate(
tercile = 1 + perc_race %/% (100/3),
simple_race = Race %>% str_replace("Enroll", "") %>% str_to_lower()
) %>%
pivot_wider(names_from = simple_race, values_from = tercile)
#> # A tibble: 8 x 7
#> # Groups: Race [4]
#> School Race perc_race black white his asian
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 EnrollBlack 3 1 NA NA NA
#> 2 2 EnrollBlack 67 3 NA NA NA
#> 3 3 EnrollWhite 4 NA 1 NA NA
#> 4 4 EnrollWhite 8 NA 1 NA NA
#> 5 5 EnrollHis 55 NA NA 2 NA
#> 6 6 EnrollHis 88 NA NA 3 NA
#> 7 7 EnrollAsian 43 NA NA NA 2
#> 8 8 EnrollAsian 34 NA NA NA 2
回答你關於dplyr
函數的問題,你想定義的函數可以這樣寫。 對於將race_name
作為列名處理的函數,我們需要使用!!
和:=
語法。
def_tercile <- function(data, race_value, race_name) {
mutate(data,
!!race_name := case_when(
Race == race_value & perc_race > 66.66 ~ "3",
Race == race_value & perc_race > 33.33 ~"2",
Race == race_value & perc_race <= 33.33 ~"1")
)
}
df %>%
def_tercile("EnrollBlack", "black") %>%
def_tercile("EnrollWhite", "white") %>%
def_tercile("EnrollHis", "his") %>%
def_tercile("EnrollAsian", "asian")
#> # A tibble: 8 x 7
#> School Race perc_race black white his asian
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 EnrollBlack 3 1 NA NA NA
#> 2 2 EnrollBlack 67 3 NA NA NA
#> 3 3 EnrollWhite 4 NA 1 NA NA
#> 4 4 EnrollWhite 8 NA 1 NA NA
#> 5 5 EnrollHis 55 NA NA 2 NA
#> 6 6 EnrollHis 88 NA NA 3 NA
#> 7 7 EnrollAsian 43 NA NA NA 2
#> 8 8 EnrollAsian 34 NA NA NA 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.