簡體   English   中英

R函數用多個函數匯總多列數據,按列分組

[英]R function to summarize multiple columns of data with multiple functions, grouped by a column

我有一個包含以下列的數據框:

  • game_id - chr,每場比賽 1 個 ID,每場比賽多行
  • home_lineup - chr
  • away_lineup - chr
  • home_plusminus - int
  • away_plusminus - int
  • home_team - chr
  • away_team - chr

我需要計算每個 home_lineup 和每個 away_lineup 的 home_plusminus 和 away_plusminus 的每場比賽總和。

數據如下所示:

game_id home_lineup awaylineup home_Plusminus Away_Plusminus home_team  away_team
12345   L1          L2          -2              2            BOS         ATL
12345   L3          L4           3             -3            BOS         ATL
12345   L3          L4           3             -3            BOS         ATL
45678   L2          L1           3             -3            ATL         BOS
45678   L2          L7           1             -1            ATL         BOS
45678   L8          L1           3             -3            ATL         BOS

以上數據顯示了2場比賽。

我希望最終輸出如下所示:

Team Lineup PlusMinus Pergame
BOS  L1     -8        -4.0
BOS  L3      6         6.0
BOS  L7     -1        -1.0
ATL  L2      6         3.0
ATL  L4     -6        -6.0
ATL  L8      3         3.0

所以在上面的例子中,L1 打了兩場比賽,總加減為 -8。 L3只打了1場。

這是tidyrdplyr的一種方法。

library(tidyr); library(dplyr)

# Step 1 - make into tidy data frame with one row per observation
home <- df %>% select(game_id, contains("home")) %>% 
  rename("Lineup" = "home_lineup", "Team" = "home_team", "plusminus" = "home_Plusminus")

away <- df %>% select(game_id, contains("away")) %>% 
  rename("Lineup" = "awaylineup", "Team" = "away_team", "plusminus" = "Away_Plusminus")

tidy <- bind_rows(home, away, .id = "location")



# Step 2 - summarize
output <- tidy %>%
  group_by(Team, Lineup) %>%
  summarize(PlusMinus = sum(plusminus),
            PerGame = PlusMinus/n_distinct(game_id)) %>% ungroup()

輸出:

> output
# A tibble: 6 x 4
  Team  Lineup PlusMinus PerGame
  <chr> <chr>      <int>   <dbl>
1 ATL   L2             6       3
2 ATL   L4            -6      -6
3 ATL   L8             3       3
4 BOS   L1            -8      -4
5 BOS   L3             6       6
6 BOS   L7            -1      -1

樣本數據:

df <- read.table(header = T, stringsAsFactors = F, text = "
                 game_id home_lineup awaylineup  home_Plusminus  Away_Plusminus  home_team   away_team
 12345  L1          L2          -2              2               BOS       ATL
     12345  L3          L4           3             -3               BOS       ATL
     12345  L3          L4           3             -3               BOS       ATL
     45678  L2          L1           3             -3               ATL       BOS
     45678  L2          L7           1             -1               ATL       BOS
     45678  L8          L1           3             -3               ATL       BOS")

喬恩的類似解決方案:

library(tidyverse)

dat <- tribble(
  ~game_id, ~home_lineup, ~awaylineup,  ~home_Plusminus,  ~Away_Plusminus,  ~home_team,   ~away_team,
  12345,  "L1",          "L2",          -2,              2,               "BOS",       "ATL",
  12345,  "L3",          "L4",           3,             -3,               "BOS",       "ATL",
  # 12345,  "L3",          "L4",           3,             -3,               "BOS",       "ATL",
  45678,  "L2",          "L1",           3,             -3,               "ATL",       "BOS",
  45678,  "L2",          "L7",           1,             -1,               "ATL",       "BOS",
  45678,  "L8",          "L1",           3,             -3,               "ATL",       "BOS"
)

long <- 
  dat %>% 
  gather(where, team, home_team:away_team) %>% 
  mutate(
    home_lineup = case_when(where == "home_team" ~ home_lineup,
                            TRUE ~ NA_character_),
    away_lineup = case_when(where == "away_team" ~ awaylineup,
                            TRUE ~ NA_character_),
    home_plusminus = case_when(where == "home_team" ~ home_Plusminus,
                            TRUE ~ NA_real_),
    away_plusminus = case_when(where == "away_team" ~ Away_Plusminus,
                            TRUE ~ NA_real_)
  ) %>% 
  select(-home_Plusminus, -Away_Plusminus, -awaylineup) %>% 
  gather(plus_minus_type, plus_minus, home_plusminus:away_plusminus) %>%
  gather(lineup_type, lineup, home_lineup:away_lineup, -where, -team) %>% 
  mutate(
    where = where %>% str_remove("_team"),
    lineup_type = lineup_type %>% str_remove("_") %>% str_remove("lineup"),
    plus_minus_type = lineup_type %>% str_remove("_Plusminus")
  ) %>% 
  drop_na()

long %>% 
  group_by(
    team, lineup
  ) %>% 
  summarise(
    PlusMinus = sum(plus_minus),
    Pergame = sum(plus_minus) / n()
  )
#> # A tibble: 6 x 4
#> # Groups:   team [?]
#>   team  lineup PlusMinus Pergame
#>   <chr> <chr>      <dbl>   <dbl>
#> 1 ATL   L2             6    2   
#> 2 ATL   L4            -3   -3   
#> 3 ATL   L8             3    3   
#> 4 BOS   L1            -8   -2.67
#> 5 BOS   L3             3    3   
#> 6 BOS   L7            -1   -1

reprex 包(v0.2.1) 於 2018 年 10 月 26 日創建

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM