![](/img/trans.png)
[英]How to get quantiles to work with summarise_at and group_by (dplyr)
[英]Is it possible to combine summarise with summarise_at in a single group_by with dplyr
編輯:只是意識到數據中的side
欄根本沒有使用,因此出於示例目的,請忽略它。
我有一個的大數據幀播放的播放籃球數據,我想執行group_by
, summarise
和summarise_at
我的數據。 以下是我的數據框的子集:
> dput(zed)
structure(list(side = c("right", "right", "right", "right", "right",
"right", "left", "right", "right", "right", "left", "right",
"left", "left", "left", "right", "right", "right", "left", "right"
), result = c("twopointmiss", "twopointmade", "twopointmade",
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss",
"twopointmade", "twopointmade", "twopointmade", "twopointmade",
"twopointmade", "twopointmiss", "twopointmiss", "twopointmiss",
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss",
"twopointmiss"), zonenumber = c(1, 1, 1, 1, 2, 3, 2, 3, 2, 3,
4, 4, 4, 1, 1, 2, 3, 2, 3, 4), team = c("Bos", "Bos", "Bos",
"Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Min", "Min",
"Min", "Min", "Min", "Min", "Min", "Min", "Min", "Min")), row.names = c(3L,
5L, 8L, 14L, 17L, 23L, 28L, 30L, 39L, 41L, 42L, 43L, 47L, 52L,
54L, 58L, 60L, 63L, 69L, 72L), class = "data.frame")
> zed
side result zonenumber team
3 right twopointmiss 1 Bos
5 right twopointmade 1 Bos
8 right twopointmade 1 Bos
14 right twopointmiss 1 Bos
17 right twopointmade 2 Bos
23 right twopointmade 3 Bos
28 left twopointmiss 2 Bos
30 right twopointmade 3 Bos
39 right twopointmade 2 Bos
41 right twopointmade 3 Bos
42 left twopointmade 4 Min
43 right twopointmade 4 Min
47 left twopointmiss 4 Min
52 left twopointmiss 1 Min
54 left twopointmiss 1 Min
58 right twopointmiss 2 Min
60 right twopointmade 3 Min
63 right twopointmade 2 Min
69 left twopointmiss 3 Min
72 right twopointmiss 4 Min
在下面的例子中,我只用summarise
,因為我目前不知道如何使用summarise
和 summarise_at
具有相同group_by
電話:
> grouped.df <- zed %>%
+ dplyr::group_by(team) %>%
+ dplyr::summarise(
+ shotsMade = sum(result == "twopointmade"),
+ shotsAtt = n(),
+ shotsPct = round(shotsMade / shotsAtt),
+ points = 2 * shotsMade,
+
+ z1Made = sum(zonenumber == 1),
+ z2Made = sum(zonenumber == 2),
+ z3Made = sum(zonenumber == 3),
+ z4Made = sum(zonenumber == 4)
+ )
> grouped.df
# A tibble: 2 x 9
team shotsMade shotsAtt shotsPct points z1Made z2Made z3Made z4Made
<chr> <int> <int> <dbl> <dbl> <int> <int> <int> <int>
1 Bos 7 10 1 14 4 3 3 0
2 Min 4 10 0 8 2 2 2 4
在下面的示例中,我想在summary中創建前4列(shotsMade,shotsAtt,shotsPct,points),並使用summarise
創建z#made
列。 在我的全部數據中,我計划使用summarise_at
創建約30個類似獨特的列,計划使用summarise
創建約80個類似相似的列。
舉個小例子,我不想將整個數據框都帶入這個例子。 如果我能夠在上面的示例中同時實現summarise_at
和summarise
,那么我也將能夠在整個數據幀中實現它。
非常感謝對此的任何想法,因為我特別熱衷於使用_at
中的_at
函數進行改進。 謝謝!
我認為沒有一種方法可以同時使用summarise_at
和summarise
,因為很明顯,在丟失許多行和列之后,我們將無法執行第二個方法。
所以,相反,我們可以使用mutate
, mutate_at
,然后丟棄某些行(或許列)這一點,不知何故神奇地應用之間,差異summarise
和summarise_at
將是前者的做法不會掉落任何變量。 我想這取決於對您是否有益。 在下面,我添加了一條額外的select(-one_of(setdiff(names(zed), "team")))
,該行實際上將刪除select(-one_of(setdiff(names(zed), "team")))
組合將刪除的所有列。
zed$zonenumber2 <- zed$zonenumber # Example
zed %>%
group_by(team) %>%
mutate(
shotsMade = sum(result == "twopointmade"),
shotsAtt = n(),
shotsPct = round(shotsMade / shotsAtt),
points = 2 * shotsMade) %>%
mutate_at(
vars(contains("zone")),
.funs = funs(Made1 = sum(. == 1), Made2 = sum(. == 2),
Made3 = sum(. == 3), Made4 = sum(. == 4))) %>%
filter(!duplicated(team)) %>%
select(-one_of(setdiff(names(zed), "team"))) # May want to remove
# A tibble: 2 x 13
# Groups: team [2]
# team shotsMade shotsAtt shotsPct points zonenumber_Made1 zonenumber2_Mad… zonenumber_Made2
# <chr> <int> <int> <dbl> <dbl> <int> <int> <int>
# 1 Bos 7 10 1 14 4 4 3
# 2 Min 4 10 0 8 2 2 2
# … with 5 more variables: zonenumber2_Made2 <int>, zonenumber_Made3 <int>,
# zonenumber2_Made3 <int>, zonenumber_Made4 <int>, zonenumber2_Made4 <int>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.