基于条件按组变异

Question

I am trying to add a summary column to a dataframe.我正在尝试向数据框添加摘要列。 Although the summary statistic should be applied to every column, the statistic itself should only be calculated based on conditional rows.尽管汇总统计应应用于每一列，但统计本身应仅基于条件行进行计算。

As an example, given this dataframe:例如，给定此数据框：

x <- data.frame(usernum=rep(c(1,2,3,4),each=3),
                final=rep(c(TRUE,TRUE,FALSE,FALSE)),
                time=1:12)

I would like to add a usernum.mean column, but where the mean is only calculated when final=TRUE .我想添加一个usernum.mean列，但是只有在final=TRUE时才计算平均值。 I have tried:我试过了：

library(tidyverse)

x %>% 
  group_by(usernum) %>%
  mutate(user.mean = mean(x$time[x$final==TRUE]))

but this gives an overall mean, rather than by user.但这给出了一个整体平均值，而不是用户。 I have also tried:我也试过：

x %>% 
  group_by(usernum) %>%
  filter(final==TRUE) %>% 
  mutate(user.mean = mean(time))

but this only returns the filtered dataframe:但这只会返回过滤后的数据框：

# A tibble: 6 x 4
# Groups:   usernum [4]
  usernum final  time user.mean
    <dbl> <lgl> <int>     <dbl>
1       1 TRUE      1       1.5
2       1 TRUE      2       1.5
3       2 TRUE      5       5.5
4       2 TRUE      6       5.5
5       3 TRUE      9       9  
6       4 TRUE     10      10

How can I apply those means to every original row?我如何将这些方法应用于每个原始行？

Answer 1

If we use x$ after the group_by , it returns the entire column instead of only the values in that particular group.如果我们在group_by之后使用x$ ，它将返回整个列，而不仅仅是该特定组中的值。 Second, TRUE/FALSE is logical vector, so we don't need ==其次， TRUE/FALSE是逻辑向量，所以我们不需要==

library(dplyr)
x %>%
     group_by(usernum) %>% 
     mutate(user.mean = mean(time[final]))

The one option where we can use $ is with .data我们可以使用$的一个选项是.data

x %>% 
    group_by(usernum) %>%
    mutate(user.mean = mean(.data$time[.data$final]))

基于条件按组变异

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-01-27 22:04:59

基于条件按组变异

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-01-27 22:04:59

解决方案1
2 已采纳 2020-01-27 22:04:59