简体   繁体   English

在 group_by 中选择值并根据 R 中的另一个列值进行汇总

[英]Select value in group_by and summarize based on another column value in R

In the following example, how would I select a value (from mpg ) per group ( cyl ) depending on a condition in another a column ( carb == 1 ).在以下示例中,我将如何根据另一列( carb == 1 )中的条件为每个组 ( cyl ) 选择一个值(来自mpg )。 Note that I also want to summarize another variable (averaging qsec per group).请注意,我还想总结另一个变量( qsec平均qsec )。 My best guess below gets an error:下面我最好的猜测得到一个错误:

library(dplyr)
mtcars %>% 
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>% 
    summarize(
        mpg = mpg[.$carb == 1],
        qsec = mean(qsec)
    )

If there are more than one rows having 'carb' as 1 and summarise returns only a single row per group or without any group, it is better to wrap the output in a list .如果存在具有“碳水化合物”为1以上的行和summarise返回每组或没有任何组只有一行,最好是包输出在一个list If we use $ , it would break the grouping如果我们使用$ ,它会破坏分组

library(tidyverse)
out <- mtcars %>% 
        distinct(cyl, carb, .keep_all = TRUE) %>% 
        group_by(cyl) %>% 
        summarize(
          mpg = list(mpg[carb == 1]),
          qsec = mean(qsec)
        ) 

out
# A tibble: 3 x 3
#    cyl mpg        qsec
#  <dbl> <list>    <dbl>
#1     4 <dbl [1]>  19.3
#2     6 <dbl [1]>  17.1
#3     8 <dbl [0]>  16.2

By looking at the output, for the 'cyl' 8, there are no 'carb' which is equal to 1. and that results in numeric(0)通过查看输出,对于 'cyl' 8,没有等于 1 的 'carb',结果是numeric(0)

By wrapping with replace_na , elements that are of length 0 can be changed to NA and then do unnest .通过用replace_na包装,长度为 0 的元素可以更改为NA ,然后执行unnest Otherwise, as @Dave Gruenewald mentioned in the comments, that row could be removed automatically while unnest ing否则,正如@Dave Gruenewald 在评论中提到的,该行可以在unnest自动删除

out %>% 
  mutate(mpg = replace_na(mpg)) %>% 
  unnest
# A tibble: 3 x 3
#    cyl  qsec   mpg
#  <dbl> <dbl> <dbl>
#1     4  19.3  22.8
#2     6  17.1  21.4
#3     8  16.2  NA  

Another option, if we already know that there would be at most 1 element of 'carb' that is equal to 1, then use an if/else condition in summarise另一种选择,如果我们已经知道 'carb' 中最多有 1 个元素等于 1,那么在summarise使用if/else条件

mtcars %>%
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>%
    summarise(
       mpg = if(any(carb == 1)) mpg[carb==1] else NA_real_,
       qsec = mean(qsec)
 )
# A tibble: 3 x 3
#     cyl   mpg  qsec
#   <dbl> <dbl> <dbl>
#1     4  22.8  19.3
#2     6  21.4  17.1
#3     8  NA    16.2

However, it is better to assume that there could be more than one 'carb' values that are 1 for each 'cyl' and wrap it in a list , later unnest但是,最好假设每个 'cyl' 可能有多个 'carb' 值为 1 的值并将其包装在一个list ,然后unnest

mtcars %>%
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>%
    summarise(
       mpg = list(if(any(carb == 1)) mpg[carb==1] else NA_real_),
       qsec = mean(qsec)) %>%
    unnest

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM