[英]Select value in group_by and summarize based on another column value in R
In the following example, how would I select a value (from mpg
) per group ( cyl
) depending on a condition in another a column ( carb == 1
).在以下示例中,我将如何根据另一列(
carb == 1
)中的条件为每个组 ( cyl
) 选择一个值(来自mpg
)。 Note that I also want to summarize another variable (averaging qsec
per group).请注意,我还想总结另一个变量(
qsec
平均qsec
)。 My best guess below gets an error:下面我最好的猜测得到一个错误:
library(dplyr)
mtcars %>%
distinct(cyl, carb, .keep_all = TRUE) %>%
group_by(cyl) %>%
summarize(
mpg = mpg[.$carb == 1],
qsec = mean(qsec)
)
If there are more than one rows having 'carb' as 1 and summarise
returns only a single row per group or without any group, it is better to wrap the output in a list
.如果存在具有“碳水化合物”为1以上的行和
summarise
返回每组或没有任何组只有一行,最好是包输出在一个list
。 If we use $
, it would break the grouping如果我们使用
$
,它会破坏分组
library(tidyverse)
out <- mtcars %>%
distinct(cyl, carb, .keep_all = TRUE) %>%
group_by(cyl) %>%
summarize(
mpg = list(mpg[carb == 1]),
qsec = mean(qsec)
)
out
# A tibble: 3 x 3
# cyl mpg qsec
# <dbl> <list> <dbl>
#1 4 <dbl [1]> 19.3
#2 6 <dbl [1]> 17.1
#3 8 <dbl [0]> 16.2
By looking at the output, for the 'cyl' 8, there are no 'carb' which is equal to 1. and that results in numeric(0)
通过查看输出,对于 'cyl' 8,没有等于 1 的 'carb',结果是
numeric(0)
By wrapping with replace_na
, elements that are of length 0 can be changed to NA
and then do unnest
.通过用
replace_na
包装,长度为 0 的元素可以更改为NA
,然后执行unnest
。 Otherwise, as @Dave Gruenewald mentioned in the comments, that row could be removed automatically while unnest
ing否则,正如@Dave Gruenewald 在评论中提到的,该行可以在
unnest
自动删除
out %>%
mutate(mpg = replace_na(mpg)) %>%
unnest
# A tibble: 3 x 3
# cyl qsec mpg
# <dbl> <dbl> <dbl>
#1 4 19.3 22.8
#2 6 17.1 21.4
#3 8 16.2 NA
Another option, if we already know that there would be at most 1 element of 'carb' that is equal to 1, then use an if/else
condition in summarise
另一种选择,如果我们已经知道 'carb' 中最多有 1 个元素等于 1,那么在
summarise
使用if/else
条件
mtcars %>%
distinct(cyl, carb, .keep_all = TRUE) %>%
group_by(cyl) %>%
summarise(
mpg = if(any(carb == 1)) mpg[carb==1] else NA_real_,
qsec = mean(qsec)
)
# A tibble: 3 x 3
# cyl mpg qsec
# <dbl> <dbl> <dbl>
#1 4 22.8 19.3
#2 6 21.4 17.1
#3 8 NA 16.2
However, it is better to assume that there could be more than one 'carb' values that are 1 for each 'cyl' and wrap it in a list
, later unnest
但是,最好假设每个 'cyl' 可能有多个 'carb' 值为 1 的值并将其包装在一个
list
,然后unnest
mtcars %>%
distinct(cyl, carb, .keep_all = TRUE) %>%
group_by(cyl) %>%
summarise(
mpg = list(if(any(carb == 1)) mpg[carb==1] else NA_real_),
qsec = mean(qsec)) %>%
unnest
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.