簡體   English   中英

使用dplyr計算多個分組變量

[英]Using dplyr to count multiple group-by variables

我有一個包含多個分類變量的數據集

data <- data_frame(
HomeTeam = c("Team1", "Team2", "Team3", "Team4", "Team2", "Team2", "Team4", 
             "Team3", "Team2", "Team1", "Team3", "Team2"),
AwayTeam = c("Team2", "Team1", "Team4", "Team3", "Team1", "Team4", "Team1", 
             "Team2", "Team3", "Team3", "Team4", "Team1"),
HomeScore = c(10, 5, 12, 18, 17, 19, 23, 17, 34, 19, 8, 3),
AwayScore = c(4, 16, 9, 19, 16, 4, 8, 21, 6, 5, 9, 17),
Venue = c("Ground1", "Ground2", "Ground3", "Ground3", "Ground1", "Ground2", 
          "Ground1", "Ground3", "Ground2", "Ground3", "Ground4", "Ground2"))

我基本上想把“HomeTeam”和“AwayTeam”總結成一個新表,如下所示

 HomeTeam NumberOfGamesHome NumberOfGamesaWAY
 <chr>                <int>             <int>
 1 Team1                    2                 4
 2 Team2                    5                 2
 3 Team3                    3                 3
 4 Team4                    2                 3

我當前的方法需要兩個逐行代碼,然后加入表

HomeTeamCount <- data %>% 
group_by(HomeTeam) %>% 
summarise(NumberOfGamesHome = n()) 

AwayTeamCount <- data %>% 
group_by(AwayTeam) %>% 
summarise(NumberOfGamesAway = n()) 

Desired <- left_join(HomeTeamCount, AwayTeamCount, 
                 by = c("HomeTeam" = "AwayTeam"))

在我的實際數據集中,我有大量的分類變量,並且遵循上述方法似乎費力且效率低下

有沒有辦法使用dplyr來group_by多個分類變量,以產生所需的輸出? 或者可能是data.table?

我已經咨詢了其他幾個問題,比如這里這里 ,但是還沒有找到答案。

這是一種可能性,使用gather將數據從廣泛傳播到長傳,按團隊分組並總結主場和客場比賽的數量。

library(tidyverse)
data %>%
    gather(key, Team) %>%
    group_by(Team) %>%
    summarise(
        NumberOfGamesHome = sum(key == "HomeTeam"),
        NumberOfGamesaWAY = sum(key == "AwayTeam"))
## A tibble: 4 x 3
#  Team  NumberOfGamesHome NumberOfGamesaWAY
#  <chr>             <int>             <int>
#1 Team1                 2                 4
#2 Team2                 5                 2
#3 Team3                 3                 3
#4 Team4                 2                 3

更新

要使用其他列反映更新的示例數據,您可以執行此操作

data %>%
    gather(key, Team, HomeTeam, AwayTeam) %>%
    group_by(Team) %>%
    summarise(
        NumberOfGamesHome = sum(key == "HomeTeam"),
        NumberOfGamesaWAY = sum(key == "AwayTeam"))
## A tibble: 4 x 3
#  Team  NumberOfGamesHome NumberOfGamesaWAY
#  <chr>             <int>             <int>
#1 Team1                 2                 4
#2 Team2                 5                 2
#3 Team3                 3                 3
#4 Team4                 2                 3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM