简体   繁体   English

基于条件按组变异

[英]Mutate by group based on a conditional

I am trying to add a summary column to a dataframe.我正在尝试向数据框添加摘要列。 Although the summary statistic should be applied to every column, the statistic itself should only be calculated based on conditional rows.尽管汇总统计应应用于每一列,但统计本身应仅基于条件行进行计算。

As an example, given this dataframe:例如,给定此数据框:

x <- data.frame(usernum=rep(c(1,2,3,4),each=3),
                final=rep(c(TRUE,TRUE,FALSE,FALSE)),
                time=1:12)

I would like to add a usernum.mean column, but where the mean is only calculated when final=TRUE .我想添加一个usernum.mean列,但是只有在final=TRUE时才计算平均值。 I have tried:我试过了:

library(tidyverse)

x %>% 
  group_by(usernum) %>%
  mutate(user.mean = mean(x$time[x$final==TRUE]))

but this gives an overall mean, rather than by user.但这给出了一个整体平均值,而不是用户。 I have also tried:我也试过:

x %>% 
  group_by(usernum) %>%
  filter(final==TRUE) %>% 
  mutate(user.mean = mean(time))

but this only returns the filtered dataframe:但这只会返回过滤后的数据框:

# A tibble: 6 x 4
# Groups:   usernum [4]
  usernum final  time user.mean
    <dbl> <lgl> <int>     <dbl>
1       1 TRUE      1       1.5
2       1 TRUE      2       1.5
3       2 TRUE      5       5.5
4       2 TRUE      6       5.5
5       3 TRUE      9       9  
6       4 TRUE     10      10 

How can I apply those means to every original row?我如何将这些方法应用于每个原始行?

If we use x$ after the group_by , it returns the entire column instead of only the values in that particular group.如果我们在group_by之后使用x$ ,它将返回整个列,而不仅仅是该特定组中的值。 Second, TRUE/FALSE is logical vector, so we don't need ==其次, TRUE/FALSE是逻辑向量,所以我们不需要==

library(dplyr)
x %>%
     group_by(usernum) %>% 
     mutate(user.mean = mean(time[final]))

The one option where we can use $ is with .data我们可以使用$的一个选项是.data

x %>% 
    group_by(usernum) %>%
    mutate(user.mean = mean(.data$time[.data$final]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM