[英]Mutate by group based on a conditional
I am trying to add a summary column to a dataframe.我正在尝试向数据框添加摘要列。 Although the summary statistic should be applied to every column, the statistic itself should only be calculated based on conditional rows.
尽管汇总统计应应用于每一列,但统计本身应仅基于条件行进行计算。
As an example, given this dataframe:例如,给定此数据框:
x <- data.frame(usernum=rep(c(1,2,3,4),each=3),
final=rep(c(TRUE,TRUE,FALSE,FALSE)),
time=1:12)
I would like to add a usernum.mean
column, but where the mean is only calculated when final=TRUE
.我想添加一个
usernum.mean
列,但是只有在final=TRUE
时才计算平均值。 I have tried:我试过了:
library(tidyverse)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(x$time[x$final==TRUE]))
but this gives an overall mean, rather than by user.但这给出了一个整体平均值,而不是用户。 I have also tried:
我也试过:
x %>%
group_by(usernum) %>%
filter(final==TRUE) %>%
mutate(user.mean = mean(time))
but this only returns the filtered dataframe:但这只会返回过滤后的数据框:
# A tibble: 6 x 4
# Groups: usernum [4]
usernum final time user.mean
<dbl> <lgl> <int> <dbl>
1 1 TRUE 1 1.5
2 1 TRUE 2 1.5
3 2 TRUE 5 5.5
4 2 TRUE 6 5.5
5 3 TRUE 9 9
6 4 TRUE 10 10
How can I apply those means to every original row?我如何将这些方法应用于每个原始行?
If we use x$
after the group_by
, it returns the entire column instead of only the values in that particular group.如果我们在
group_by
之后使用x$
,它将返回整个列,而不仅仅是该特定组中的值。 Second, TRUE/FALSE
is logical vector, so we don't need ==
其次,
TRUE/FALSE
是逻辑向量,所以我们不需要==
library(dplyr)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(time[final]))
The one option where we can use $
is with .data
我们可以使用
$
的一个选项是.data
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(.data$time[.data$final]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.