簡體   English   中英

使用某些變量的多個參數壓縮數據框

[英]Condensing a data frame using multiple arguments from certain variables

我正在尋找基於來自多個變量的各種參數來壓縮數據框的方法,但我不確定如何以最簡單的方式實現它。 我認為這將需要某種個性化功能,但我在編寫功能方面經驗不足。

基本上,我的數據框當前如下所示:

chainID     teamID        statID        startType       endType        

1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Effective Pass      TO              TO
1           Team A     Ineffective Pass    TO              TO
2           Team B     Effective Pass      TO              SH
2           Team B     Entry               TO              SH
2           Team B     Effective Pass      TO              SH
2           Team B     Shot                TO              SH
3           Team A     Effective Pass      ST              TO
3           Team A     Entry               ST              TO
3           Team A     Ineffective Pass    ST              TO
4           Team B     Effective Pass      TO              ST
4           Team B     Effective Pass      TO              ST
4           Team B     Ineffective Pass    TO              ST
5           Team A     Effective Pass      TO              SH
5           Team A     Entry               TO              SH
5           Team A     Goal                TO              SH
6           Team B     Effective Pass      CB              TO
6           Team B     Effective Pass      CB              TO
6           Team B     Ineffective Pass    CB              TO
7           Team A     Effective Pass      TO              ST
7           Team A     Ineffective Pass    TO              ST

什么我希望做的是每當這個詞Entry出現在statID任何列chainID ,我想保持對於該行和最后一行chainID同時刪除所有其他行針對特定chainID (見chainID 2和5 )。 另外,我還需要的是,如果chainID在statID中包含Entry,但是該特定chainID的最后一行未以目標或擊球結尾,那么我希望下一個chainID保留在數據集中,如我的示例所示使用chainID 3和4。然后該函數繼續像開始時那樣按每個chainID查找條目出現的次數。 例如

chainID     teamID        statID        startType       endType        

2           Team B     Entry               TO              SH
2           Team B     Shot                TO              SH
3           Team A     Entry               ST              TO
3           Team A     Ineffective Pass    ST              TO
4           Team B     Effective Pass      TO              ST
4           Team B     Effective Pass      TO              ST
4           Team B     Ineffective Pass    TO              ST
5           Team A     Entry               TO              SH
5           Team A     Goal                TO              SH

答案分為兩個功能。 第一個函數select_rows ,根據是否存在"Entry"從每個組中選擇行。 第二個函數select_groups找出未以"Goal""Shot"結尾的組。

library(dplyr)

select_rows <- function(anyEntry, statID) {
   #If anyEntry value is not 0
   if(anyEntry[1L]) { 
      #If the last value is either "Goal" or "Shot" select "Entry" row and last row
      #else select all the rows from "Entry" to last row. 
      if(last(statID) %in% c("Goal", "Shot")) c(anyEntry[1L], length(anyEntry)) 
         else anyEntry[1L] : length(anyEntry) 
     } else 0
}

select_groups <- function(anyEntry, statID) {
    anyEntry[1L] & !last(statID) %in% c("Goal", "Shot")
}

我們創建anyEntry列,該列在存在第一個"Entry"值的組中具有行號,否則為0。 我們應用select_rowsselect_groups獨立運作,並綁定列。

df1 <- df %>%
        group_by(chainID) %>%
        mutate(anyEntry = which.max(statID == "Entry") * any(statID == "Entry"))

Ids <- df1 %>%
         summarise(newEntry = select_groups(anyEntry, statID)) %>%
         filter(newEntry) %>% pull(chainID)

df1 %>%
  slice(select_rows(anyEntry, statID)) %>%
  bind_rows(df %>% filter(chainID %in% (Ids + 1))) %>%
  select(-anyEntry) %>%
  arrange(chainID)

#   chainID teamID statID    startType  endType
#     <int> <fct>  <fct>        <fct>     <fct>  
#1       2 TeamB  Entry           TO        SH     
#2       2 TeamB  Shot            TO        SH     
#3       3 TeamA  Entry           ST        TO     
#4       3 TeamA  IneffectivePass ST        TO     
#5       4 TeamB  EffectivePass   TO        ST     
#6       4 TeamB  EffectivePass   TO        ST     
#7       4 TeamB  IneffectivePass TO        ST     
#8       5 TeamB  Entry           TO        SH     
#9       5 TeamB  Goal            TO        SH   

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM