与dplyr“其他”小组进行汇总

Question

我需要在一个分组的data_frame中进行总结（警告：非常感谢使用dplyr的解决方案，但这不是强制性的），每个组（简单）上的某些内容和“其他”组上的相同内容。

最小的例子

if(!require(pacman)) install.packages(pacman)
pacman::p_load(dplyr)

df <- data_frame(
    group = c('a', 'a', 'b', 'b', 'c', 'c'),
    value = c(1, 2, 3, 4, 5, 6)
)

res <- df %>%
    group_by(group) %>%
    summarize(
        median        = median(value)
#        median_other  = ... ??? ... # I need the median of all "other"
                                     # groups
#        median_before = ... ??? ... # I need the median of groups (e.g
                                 #    the "before" in alphabetic order,
                                 #    but clearly every roule which is
                                 #    a "selection function" depending
                                 #    on the actual group is fine)
    )

我的预期结果如下

group    median    median_other    median_before
  a        1.5         4.5               NA
  b        3.5         3.5               1.5
  c        5.5         2.5               2.5

我已经搜索了类似于“ dplyr总结不包括组”，“ dplyr总结了其他然后是组”的Google字符串，我已经搜索了dplyr文档，但找不到解决方案。

在这里，此方法（如何使用dplyr汇总与组不匹配的值）不适用，因为它仅基于求和，即是“特定于函数”的解决方案（并且具有简单的算术函数，未考虑每个组的可变性）。 关于更复杂的功能请求（即，平均值，标准或用户功能）呢？ :-)

谢谢大家

PS：summary summarize()是一个示例，相同的问题导致mutate()或其他基于组工作的dplyr函数。

Answer 1

我认为一般不可能对summarise()其他组执行操作（即，我认为汇总某个组时其他组不“可见”）。 您可以定义自己的函数，并在mutate中使用它们以将其应用于特定变量。 对于您的更新示例，您可以使用

calc_med_other <- function(x) sapply(seq_along(x), function(i) median(x[-i]))
calc_med_before <- function(x) sapply(seq_along(x), function(i) ifelse(i == 1, NA, median(x[seq(i - 1)])))

df %>%
    group_by(group) %>%
    summarize(med = median(value)) %>%
    mutate(
        med_other = calc_med_other(med),
        med_before = calc_med_before(med)
    )
#   group   med med_other med_before
#   (chr) (dbl)     (dbl)      (dbl)
#1     a   1.5       4.5         NA
#2     b   3.5       3.5        1.5
#3     c   5.5       2.5        2.5

Answer 2

这是我的解决方案：

res <- df %>%
  group_by(group) %>%
  summarise(med_group = median(value),
            med_other = (median(df$value[df$group != group]))) %>% 
  mutate(med_before = lag(med_group))

> res
Source: local data frame [3 x 4]

      group med_group med_other med_before
  (chr)     (dbl)     (dbl)      (dbl)
1     a       1.5       4.5         NA
2     b       3.5       3.5        1.5
3     c       5.5       2.5        3.5

我试图提出一个全余数解法，但是基数R子集可以很好地与median(df$value[df$group != group])返回不属于当前组的所有观察值的中值。

我希望这可以帮助您解决问题。

与dplyr“其他”小组进行汇总

问题描述

2 个解决方案

解决方案1
2 2016-04-06 12:06:41

解决方案2
1 已采纳 2016-04-06 22:35:41

与dplyr“其他”小组进行汇总

问题描述

2 个解决方案

解决方案1 2 2016-04-06 12:06:41

解决方案2 1 已采纳 2016-04-06 22:35:41

解决方案1
2 2016-04-06 12:06:41

解决方案2
1 已采纳 2016-04-06 22:35:41