繁体   English   中英

dplyr汇总函数不适用于全局环境变量

[英]dplyr summarise function not working with global environment variables

我正在尝试编写一个函数,该函数根据给定另一列的值来计算一列的比例(结果)。 代码如下:

thresh_measure <- function(data, indicator, thresh_value)
{
   d1 <- data %>% 
    group_by(class_number, outcome) %>%
    summarize(n=sum(indicator <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

汇总函数似乎存在错误,其中当前函数返回全0。 当我将其更改为如下所示时,它可以工作:

thresh_measure <- function(data, indicator, thresh_value)
{
   d1 <- data %>% 
    group_by(class_number, outcome) %>%
    summarize(n=sum(pass_rate <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

我尝试使用.GlobalEnv设置值,我还分离了除dplyr之外的所有库,但仍然无法正常工作。

您必须处理要作为参数传递的列的名称..例如(肯定存在更好的方法):

thresh_measure <- function(data, indicator, thresh_value)
{
  d1 <- data
 names(d1)[names(d1)==indicator] <- "indicator"
  d1 <- d1 %>% 
    group_by(class_number, outcome)  %>%
    summarize(n=sum(indicator <= thresh_value))  %>% spread(outcome, n)

   d1$thresh_value <- thresh_value
  return(d1)
}

两种可行的替代方法:

# alternative I
thresh_measure <- function(data, indicator, thresh_value)
{
    ind_quo <- rlang::sym(indicator)
    d1 <- data %>%
        group_by(class_number, outcome) %>%
        summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, 'pass_rate', 0.8)

# alternative II
thresh_measure <- function(data, indicator, thresh_value)
{
    ind_quo <- enquo(indicator)
    d1 <- data %>%
        group_by(class_number, outcome) %>%
        summarize(n=sum(UQ(ind_quo) <= thresh_value)) %>% spread(outcome, n)
    d1$thresh_value <- thresh_value
    return(d1)
}

final_test <- thresh_measure(df, pass_rate, 0.8)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM