使用某些變量的多個參數壓縮數據框

Question

我正在尋找基於來自多個變量的各種參數來壓縮數據框的方法，但我不確定如何以最簡單的方式實現它。 我認為這將需要某種個性化功能，但我在編寫功能方面經驗不足。

基本上，我的數據框當前如下所示：

chainID     teamID        statID        startType       endType        

1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Ineffective Pass    TO              TO
2           Team B     Effective Pass      TO              SH
2           Team B     Entry               TO              SH
2           Team B     Effective Pass      TO              SH
2           Team B     Shot                TO              SH
3           Team A     Effective Pass      ST              TO
3           Team A     Entry               ST              TO
3           Team A     Ineffective Pass    ST              TO
4           Team B     Effective Pass      TO              ST
4           Team B     Effective Pass      TO              ST
4           Team B     Ineffective Pass    TO              ST
5           Team A     Effective Pass      TO              SH
5           Team A     Entry               TO              SH
5           Team A     Goal                TO              SH
6           Team B     Effective Pass      CB              TO
6           Team B     Effective Pass      CB              TO
6           Team B     Ineffective Pass    CB              TO
7           Team A     Effective Pass      TO              ST
7           Team A     Ineffective Pass    TO              ST

什么我希望做的是每當這個詞Entry出現在statID任何列chainID ，我想保持對於該行和最后一行chainID同時刪除所有其他行針對特定chainID （見chainID 2和5 ）。另外，我還需要的是，如果chainID在statID中包含Entry，但是該特定chainID的最后一行未以目標或擊球結尾，那么我希望下一個chainID保留在數據集中，如我的示例所示使用chainID 3和4。然后該函數繼續像開始時那樣按每個chainID查找條目出現的次數。例如

chainID     teamID        statID        startType       endType        

2           Team B     Entry               TO              SH
2           Team B     Shot                TO              SH
3           Team A     Entry               ST              TO
3           Team A     Ineffective Pass    ST              TO
4           Team B     Effective Pass      TO              ST
4           Team B     Effective Pass      TO              ST
4           Team B     Ineffective Pass    TO              ST
5           Team A     Entry               TO              SH
5           Team A     Goal                TO              SH

Answer 1

答案分為兩個功能。 第一個函數select_rows ，根據是否存在"Entry"從每個組中選擇行。 第二個函數select_groups找出未以"Goal"或"Shot"結尾的組。

library(dplyr)

select_rows <- function(anyEntry, statID) {
   #If anyEntry value is not 0
   if(anyEntry[1L]) { 
      #If the last value is either "Goal" or "Shot" select "Entry" row and last row
      #else select all the rows from "Entry" to last row. 
      if(last(statID) %in% c("Goal", "Shot")) c(anyEntry[1L], length(anyEntry)) 
         else anyEntry[1L] : length(anyEntry) 
     } else 0
}

select_groups <- function(anyEntry, statID) {
    anyEntry[1L] & !last(statID) %in% c("Goal", "Shot")
}

我們創建anyEntry列，該列在存在第一個"Entry"值的組中具有行號，否則為0。 我們應用select_rows和select_groups獨立運作，並綁定列。

df1 <- df %>%
        group_by(chainID) %>%
        mutate(anyEntry = which.max(statID == "Entry") * any(statID == "Entry"))

Ids <- df1 %>%
         summarise(newEntry = select_groups(anyEntry, statID)) %>%
         filter(newEntry) %>% pull(chainID)

df1 %>%
  slice(select_rows(anyEntry, statID)) %>%
  bind_rows(df %>% filter(chainID %in% (Ids + 1))) %>%
  select(-anyEntry) %>%
  arrange(chainID)

#   chainID teamID statID    startType  endType
#     <int> <fct>  <fct>        <fct>     <fct>  
#1       2 TeamB  Entry           TO        SH     
#2       2 TeamB  Shot            TO        SH     
#3       3 TeamA  Entry           ST        TO     
#4       3 TeamA  IneffectivePass ST        TO     
#5       4 TeamB  EffectivePass   TO        ST     
#6       4 TeamB  EffectivePass   TO        ST     
#7       4 TeamB  IneffectivePass TO        ST     
#8       5 TeamB  Entry           TO        SH     
#9       5 TeamB  Goal            TO        SH

使用某些變量的多個參數壓縮數據框

問題描述

1 個解決方案

解決方案1
1 已采納 2019-06-26 10:05:28

使用某些變量的多個參數壓縮數據框

問題描述

1 個解決方案

解決方案1 1 已采納 2019-06-26 10:05:28

解決方案1
1 已采納 2019-06-26 10:05:28