简体   繁体   English

如何在dplyr管道中将变量名称传递给条件求和?

[英]How do I pass a variable name to conditionally sum in dplyr pipe?

The crux of the problem is how to pass in a column variable into a grouped df to conditionally sum data. 问题的关键是如何将列变量传递到分组df中以有条件地对数据求和。 Data for the example follows: 该示例的数据如下:

library(dplyr)
library(rlang)
set.seed(1)

# dummy dates
date_vars <- purrr::map(c('2018-01-31', '2018-02-28', '2018-03-31', 
                         '2018-04-30', '2018-05-31', '2018-06-30', 
                         '2018-07-31', '2018-08-31', '2018-09-30', 
                         '2018-10-31', '2018-11-30', '2018-12-31'), as.Date) %>% 
  purrr::reduce(c)

dummy_df <- tibble(

  id = rep(c("a", "b", "c"), each =  12),
  date = rep(date_vars, 3),
  value = runif(36, 1, 10)

)

The function below will take a data frame, group by a variable (using rlang's sym function), then create a new summary column by adding all values where the date is greater or equal to some date period. 下面的函数将采用数据框,按变量分组(使用rlang的sym函数),然后通过添加日期大于或等于某个日期周期的所有值来创建新的摘要列。 Here I am summing 3 months of 'values'. 在这里,我总结了3个月的“价值观”。

agg_by_period <- function(df, date_period, period, grouping, new_col_prefix){

  grouping_vars <- syms(grouping)

  new_sum_column <- quo_name(paste0(new_col_prefix, "sum_", period, 'm'))

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum(value[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

}


agg_by_period(df = dummy_df, 
              date_period = as.Date('2018-10-31'), 
              grouping = 'id',
              period = 3,
              new_col_prefix = 'new_'
)


# A tibble: 3 x 2
  id    new_sum_3m
  <chr>      <dbl>
1 a           7.00
2 b          11.9 
3 c          18.1 


Great! 大! My question is specific to making 'value' in the function dynamic when this column is named something other than "value". 我的问题是特定的,当这个列被命名为“值”以外的东西时,在函数动态中使'value'。 My naive attempt to pass in this column using sym() and its error follows: 我使用sym()传递此列的天真尝试及其错误如下:



agg_by_period2 <- function(df, date_period, period, grouping, new_col_prefix, 
                          value_var){

  grouping_vars <- syms(grouping)

  new_sum_column = quo_name(paste0(new_col_prefix, "sum_", period, 'm'))

  value_var_col <- sym(value_var)

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum(!!value_var_col[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

}


agg_by_period2(df = dummy_df, 
              date_period = as.Date('2018-10-31'), 
              grouping = 'id',
              period = 3,
              new_col_prefix = 'new_',
              value_var = 'value'
)

 Error in `>=.default`(date, date_period) : 
  comparison (5) is possible only for atomic and list types 

The above function will work when removing the date criteria ([date >= date_period]). 删除日期条件([date> = date_period])时,上述功能将起作用。 Any help would be greatly appreciated. 任何帮助将不胜感激。

This appears to be an order-of-operations problem with !! 这似乎是一个操作顺序问题!! and [ . [ Looks like you just need to wrap the splice in parenthesis 看起来你只需要将拼接包裹在括号中

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum((!!value_var_col)[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

note the (!!value_var_col) rather than just !!value_var_col . 注意(!!value_var_col)而不仅仅是!!value_var_col This way the splicing will happen before the subsetting. 这种拼接将在子集化之前发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM