如何使用 R dplyr's summarize 來計算符合條件的行數？

Question

我有一個要總結的數據集。 首先，我想要主場和客場比賽的總和，我可以做到。 但是，我還想知道每個子類別（主場、客場）中有多少異常值（定義為超過 300 分）。

如果我沒有使用 summarize，我知道dplyr有count() function，但我希望這個解決方案出現在我的summarize()調用中。 這是我所擁有的和我嘗試過的，但未能執行：

#Test data
library(dplyr)

test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
                  location = c("home", "away", "home", "away", "home", "away", "home", "away"),
                  more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))


#attempt 1, count rows that match a criteria
test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers = nrow(.[more_than_300 == FALSE]))

Answer 1

您可以對邏輯向量使用sum - 它會自動將它們轉換為數值（ TRUE等於 1， FALSE等於 0），因此您只需執行以下操作：

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(more_than_300))
#> # A tibble: 2 x 3
#>   location total_score n_outliers
#>   <chr>          <dbl>      <int>
#> 1 away             927          2
#> 2 home             552          0

或者，如果這些是您僅有的 3 列，則等效項是：

test %>%
  group_by(location) %>%
  summarize(across(everything(), sum))

事實上，您不需要制作more_than_300列 - 這樣做就足夠了：

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(score > 300))

Answer 2

在 base R 中，我們可以像這樣嘗試aggregate

> aggregate(.~location,test,sum)
  location score more_than_300
1     away   927             2
2     home   552             0

Answer 3

在基礎xtabs中可以用來總結每組。

xtabs(cbind(score, more_than_300) ~ ., test)
#location score more_than_300
#    away   927             2
#    home   552             0

或者通過動態計算異常值並給出所需的列名。

xtabs(cbind(total_score = score, n_outliers = score > 300) ~ location, test)
#location total_score n_outliers
#    away         927          2
#    home         552          0

另一個選項，也是在 base 中，將是rowsum 。

with(test, rowsum(cbind(total_score = score, n_outliers = score > 300), location))
#     total_score n_outliers
#away         927          2
#home         552          0

xtabs和rowsum專門用於計算每組的總和，並且可能在此任務中表現出色。

如何使用 R dplyr's summarize 來計算符合條件的行數？

問題描述

3 個解決方案

解決方案1
7 已采納 2022-04-19 12:24:46

解決方案2
4 2022-04-19 12:31:44

解決方案3
3 2022-04-19 12:56:12

如何使用 R dplyr's summarize 來計算符合條件的行數？

問題描述

3 個解決方案

解決方案1 7 已采納 2022-04-19 12:24:46

解決方案2 4 2022-04-19 12:31:44

解決方案3 3 2022-04-19 12:56:12

解決方案1
7 已采納 2022-04-19 12:24:46

解決方案2
4 2022-04-19 12:31:44

解決方案3
3 2022-04-19 12:56:12