[英]Condensing a data frame using multiple arguments from certain variables
我正在尋找基於來自多個變量的各種參數來壓縮數據框的方法,但我不確定如何以最簡單的方式實現它。 我認為這將需要某種個性化功能,但我在編寫功能方面經驗不足。
基本上,我的數據框當前如下所示:
chainID teamID statID startType endType
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Ineffective Pass TO TO
2 Team B Effective Pass TO SH
2 Team B Entry TO SH
2 Team B Effective Pass TO SH
2 Team B Shot TO SH
3 Team A Effective Pass ST TO
3 Team A Entry ST TO
3 Team A Ineffective Pass ST TO
4 Team B Effective Pass TO ST
4 Team B Effective Pass TO ST
4 Team B Ineffective Pass TO ST
5 Team A Effective Pass TO SH
5 Team A Entry TO SH
5 Team A Goal TO SH
6 Team B Effective Pass CB TO
6 Team B Effective Pass CB TO
6 Team B Ineffective Pass CB TO
7 Team A Effective Pass TO ST
7 Team A Ineffective Pass TO ST
什么我希望做的是每當這個詞Entry
出現在statID
任何列chainID
,我想保持對於該行和最后一行chainID
同時刪除所有其他行針對特定chainID
(見chainID 2和5 )。 另外,我還需要的是,如果chainID在statID中包含Entry,但是該特定chainID的最后一行未以目標或擊球結尾,那么我希望下一個chainID保留在數據集中,如我的示例所示使用chainID 3和4。然后該函數繼續像開始時那樣按每個chainID查找條目出現的次數。 例如
chainID teamID statID startType endType
2 Team B Entry TO SH
2 Team B Shot TO SH
3 Team A Entry ST TO
3 Team A Ineffective Pass ST TO
4 Team B Effective Pass TO ST
4 Team B Effective Pass TO ST
4 Team B Ineffective Pass TO ST
5 Team A Entry TO SH
5 Team A Goal TO SH
答案分為兩個功能。 第一個函數select_rows
,根據是否存在"Entry"
從每個組中選擇行。 第二個函數select_groups
找出未以"Goal"
或"Shot"
結尾的組。
library(dplyr)
select_rows <- function(anyEntry, statID) {
#If anyEntry value is not 0
if(anyEntry[1L]) {
#If the last value is either "Goal" or "Shot" select "Entry" row and last row
#else select all the rows from "Entry" to last row.
if(last(statID) %in% c("Goal", "Shot")) c(anyEntry[1L], length(anyEntry))
else anyEntry[1L] : length(anyEntry)
} else 0
}
select_groups <- function(anyEntry, statID) {
anyEntry[1L] & !last(statID) %in% c("Goal", "Shot")
}
我們創建anyEntry
列,該列在存在第一個"Entry"
值的組中具有行號,否則為0。 我們應用select_rows
和select_groups
獨立運作,並綁定列。
df1 <- df %>%
group_by(chainID) %>%
mutate(anyEntry = which.max(statID == "Entry") * any(statID == "Entry"))
Ids <- df1 %>%
summarise(newEntry = select_groups(anyEntry, statID)) %>%
filter(newEntry) %>% pull(chainID)
df1 %>%
slice(select_rows(anyEntry, statID)) %>%
bind_rows(df %>% filter(chainID %in% (Ids + 1))) %>%
select(-anyEntry) %>%
arrange(chainID)
# chainID teamID statID startType endType
# <int> <fct> <fct> <fct> <fct>
#1 2 TeamB Entry TO SH
#2 2 TeamB Shot TO SH
#3 3 TeamA Entry ST TO
#4 3 TeamA IneffectivePass ST TO
#5 4 TeamB EffectivePass TO ST
#6 4 TeamB EffectivePass TO ST
#7 4 TeamB IneffectivePass TO ST
#8 5 TeamB Entry TO SH
#9 5 TeamB Goal TO SH
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.