是否可以使用dplyr在單個group_by中將summary和summarise_at組合在一起

Question

編輯：只是意識到數據中的side欄根本沒有使用，因此出於示例目的，請忽略它。

我有一個的大數據幀播放的播放籃球數據，我想執行group_by ， summarise和summarise_at我的數據。 以下是我的數據框的子集：

> dput(zed)
structure(list(side = c("right", "right", "right", "right", "right", 
"right", "left", "right", "right", "right", "left", "right", 
"left", "left", "left", "right", "right", "right", "left", "right"
), result = c("twopointmiss", "twopointmade", "twopointmade", 
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss", 
"twopointmade", "twopointmade", "twopointmade", "twopointmade", 
"twopointmade", "twopointmiss", "twopointmiss", "twopointmiss", 
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss", 
"twopointmiss"), zonenumber = c(1, 1, 1, 1, 2, 3, 2, 3, 2, 3, 
4, 4, 4, 1, 1, 2, 3, 2, 3, 4), team = c("Bos", "Bos", "Bos", 
"Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Min", "Min", 
"Min", "Min", "Min", "Min", "Min", "Min", "Min", "Min")), row.names = c(3L, 
5L, 8L, 14L, 17L, 23L, 28L, 30L, 39L, 41L, 42L, 43L, 47L, 52L, 
54L, 58L, 60L, 63L, 69L, 72L), class = "data.frame")

>   zed
    side       result zonenumber team
3  right twopointmiss          1  Bos
5  right twopointmade          1  Bos
8  right twopointmade          1  Bos
14 right twopointmiss          1  Bos
17 right twopointmade          2  Bos
23 right twopointmade          3  Bos
28  left twopointmiss          2  Bos
30 right twopointmade          3  Bos
39 right twopointmade          2  Bos
41 right twopointmade          3  Bos
42  left twopointmade          4  Min
43 right twopointmade          4  Min
47  left twopointmiss          4  Min
52  left twopointmiss          1  Min
54  left twopointmiss          1  Min
58 right twopointmiss          2  Min
60 right twopointmade          3  Min
63 right twopointmade          2  Min
69  left twopointmiss          3  Min
72 right twopointmiss          4  Min

在下面的例子中，我只用summarise ，因為我目前不知道如何使用summarise 和 summarise_at具有相同group_by電話：

>   grouped.df <- zed %>%
+     dplyr::group_by(team) %>%
+     dplyr::summarise(
+       shotsMade = sum(result == "twopointmade"),
+       shotsAtt = n(),
+       shotsPct = round(shotsMade / shotsAtt),
+       points = 2 * shotsMade,
+       
+       z1Made = sum(zonenumber == 1),
+       z2Made = sum(zonenumber == 2),
+       z3Made = sum(zonenumber == 3),
+       z4Made = sum(zonenumber == 4)
+     )
>   grouped.df
# A tibble: 2 x 9
  team  shotsMade shotsAtt shotsPct points z1Made z2Made z3Made z4Made
  <chr>     <int>    <int>    <dbl>  <dbl>  <int>  <int>  <int>  <int>
1 Bos           7       10        1     14      4      3      3      0
2 Min           4       10        0      8      2      2      2      4

在下面的示例中，我想在summary中創建前4列（shotsMade，shotsAtt，shotsPct，points），並使用summarise創建z#made列。 在我的全部數據中，我計划使用summarise_at創建約30個類似獨特的列，計划使用summarise創建約80個類似相似的列。

舉個小例子，我不想將整個數據框都帶入這個例子。 如果我能夠在上面的示例中同時實現summarise_at和summarise ，那么我也將能夠在整個數據幀中實現它。

非常感謝對此的任何想法，因為我特別熱衷於使用_at中的_at函數進行改進。 謝謝！

Answer 1

我認為沒有一種方法可以同時使用summarise_at和summarise ，因為很明顯，在丟失許多行和列之后，我們將無法執行第二個方法。

所以，相反，我們可以使用mutate ， mutate_at ，然后丟棄某些行（或許列）這一點，不知何故神奇地應用之間，差異summarise和summarise_at將是前者的做法不會掉落任何變量。 我想這取決於對您是否有益。 在下面，我添加了一條額外的select(-one_of(setdiff(names(zed), "team"))) ，該行實際上將刪除select(-one_of(setdiff(names(zed), "team")))組合將刪除的所有列。

zed$zonenumber2 <- zed$zonenumber # Example
zed %>%
  group_by(team) %>%
  mutate(
    shotsMade = sum(result == "twopointmade"),
    shotsAtt = n(),
    shotsPct = round(shotsMade / shotsAtt),
    points = 2 * shotsMade) %>%
  mutate_at(
    vars(contains("zone")), 
    .funs = funs(Made1 = sum(. == 1), Made2 = sum(. == 2),
                 Made3 = sum(. == 3), Made4 = sum(. == 4))) %>%
  filter(!duplicated(team)) %>%
  select(-one_of(setdiff(names(zed), "team"))) # May want to remove
# A tibble: 2 x 13
# Groups:   team [2]
#   team  shotsMade shotsAtt shotsPct points zonenumber_Made1 zonenumber2_Mad… zonenumber_Made2
#   <chr>     <int>    <int>    <dbl>  <dbl>            <int>            <int>            <int>
# 1 Bos           7       10        1     14                4                4                3
# 2 Min           4       10        0      8                2                2                2
# … with 5 more variables: zonenumber2_Made2 <int>, zonenumber_Made3 <int>,
#   zonenumber2_Made3 <int>, zonenumber_Made4 <int>, zonenumber2_Made4 <int>

是否可以使用dplyr在單個group_by中將summary和summarise_at組合在一起

問題描述

1 個解決方案

解決方案1
2 已采納 2019-01-16 23:31:19

是否可以使用dplyr在單個group_by中將summary和summarise_at組合在一起

問題描述

1 個解決方案

解決方案1 2 已采納 2019-01-16 23:31:19

解決方案1
2 已采納 2019-01-16 23:31:19