[英]Call function from the global environment with implicit dataframe variables (from the calling env?) inside dplyr::summarise or mutate
[英]dplyr summarise function not working with global environment variables
我正在嘗試編寫一個函數,該函數根據給定另一列的值來計算一列的比例(結果)。 代碼如下:
thresh_measure <- function(data, indicator, thresh_value)
{
d1 <- data %>%
group_by(class_number, outcome) %>%
summarize(n=sum(indicator <= thresh_value)) %>% spread(outcome, n)
d1$thresh_value <- thresh_value
return(d1)
}
final_test <- thresh_measure(df, 'pass_rate', 0.8)
匯總函數似乎存在錯誤,其中當前函數返回全0。 當我將其更改為如下所示時,它可以工作:
thresh_measure <- function(data, indicator, thresh_value)
{
d1 <- data %>%
group_by(class_number, outcome) %>%
summarize(n=sum(pass_rate <= thresh_value)) %>% spread(outcome, n)
d1$thresh_value <- thresh_value
return(d1)
}
final_test <- thresh_measure(df, 'pass_rate', 0.8)
我嘗試使用.GlobalEnv
設置值,我還分離了除dplyr之外的所有庫,但仍然無法正常工作。
您必須處理要作為參數傳遞的列的名稱..例如(肯定存在更好的方法):
thresh_measure <- function(data, indicator, thresh_value)
{
d1 <- data
names(d1)[names(d1)==indicator] <- "indicator"
d1 <- d1 %>%
group_by(class_number, outcome) %>%
summarize(n=sum(indicator <= thresh_value)) %>% spread(outcome, n)
d1$thresh_value <- thresh_value
return(d1)
}
兩種可行的替代方法:
# alternative I
thresh_measure <- function(data, indicator, thresh_value)
{
ind_quo <- rlang::sym(indicator)
d1 <- data %>%
group_by(class_number, outcome) %>%
summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
d1$thresh_value <- thresh_value
return(d1)
}
final_test <- thresh_measure(df, 'pass_rate', 0.8)
# alternative II
thresh_measure <- function(data, indicator, thresh_value)
{
ind_quo <- enquo(indicator)
d1 <- data %>%
group_by(class_number, outcome) %>%
summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
d1$thresh_value <- thresh_value
return(d1)
}
final_test <- thresh_measure(df, pass_rate, 0.8)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.