簡體   English   中英

dplyr匯總函數不適用於全局環境變量

[英]dplyr summarise function not working with global environment variables

我正在嘗試編寫一個函數,該函數根據給定另一列的值來計算一列的比例(結果)。 代碼如下:

thresh_measure <- function(data, indicator, thresh_value)
{
   d1 <- data %>% 
    group_by(class_number, outcome) %>%
    summarize(n=sum(indicator <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

匯總函數似乎存在錯誤,其中當前函數返回全0。 當我將其更改為如下所示時,它可以工作:

thresh_measure <- function(data, indicator, thresh_value)
{
   d1 <- data %>% 
    group_by(class_number, outcome) %>%
    summarize(n=sum(pass_rate <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

我嘗試使用.GlobalEnv設置值,我還分離了除dplyr之外的所有庫,但仍然無法正常工作。

您必須處理要作為參數傳遞的列的名稱..例如(肯定存在更好的方法):

thresh_measure <- function(data, indicator, thresh_value)
{
  d1 <- data
 names(d1)[names(d1)==indicator] <- "indicator"
  d1 <- d1 %>% 
    group_by(class_number, outcome)  %>%
    summarize(n=sum(indicator <= thresh_value))  %>% spread(outcome, n)

   d1$thresh_value <- thresh_value
  return(d1)
}

兩種可行的替代方法:

# alternative I
thresh_measure <- function(data, indicator, thresh_value)
{
    ind_quo <- rlang::sym(indicator)
    d1 <- data %>%
        group_by(class_number, outcome) %>%
        summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

# alternative II
thresh_measure <- function(data, indicator, thresh_value)
{
    ind_quo <- enquo(indicator)
    d1 <- data %>%
        group_by(class_number, outcome) %>%
        summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, pass_rate, 0.8)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM