[英]Filter dataframe on sequence of rows conditional on two columns
我有這種類型的數據,其中Sequ
列中的數值定義了一系列行,而Q
中的字符值命名了序列的類型:
df <- data.frame(
Line = 1:12,
Speaker = c(NA, "ID01.A", NA, "ID01.B", "ID07.A", NA, "ID33.B",
"ID33.A", "ID33.C", NA, "ID77.A", "ID77.C"),
Utterance = c(NA, "Who did it?", "(1.99)", "Peter did.", "Hello!", NA, "So you're coming?",
"erm", "Yes, sure.", "(0.22)", "Good night?", "Yeah, sleep well"),
Sequ = c(NA,1,1,1, NA,NA, 2,2,2, NA, 3,3),
Q = c(NA, "q_wh", "", "", NA, NA, "q_decl", "", "", NA, "q_wh", "")
)
我想將 dataframe 子集化為數字(而不是NA
)和其中Q == q_wh
的那些Sequ
值。 我可以使用na_if
然后fill
來完成此任務:
library(tidyr)
df %>%
mutate(Q = na_if(Q, "")) %>%
fill(Q, .direction = "down") %>%
filter(!is.na(Sequ) & Q == "q_wh")
Line Speaker Utterance Sequ Q
1 2 ID01.A Who did it? 1 q_wh
2 3 <NA> (1.99) 1 q_wh
3 4 ID01.B Peter did. 1 q_wh
4 11 ID77.A Good night? 3 q_wh
5 12 ID77.C Yeah, sleep well 3 q_wh
但是,是否有另一種更直接的方法,無需繞過na_if
和fill
來過濾df
?
只需使用條件進行子集化。
df[with(df, !is.na(Sequ) & Q == 'q_wh'), ]
# Line Speaker Utterance Sequ Q
# 2 2 ID01.A Who did it? 1 q_wh
# 11 11 ID77.A Good night? 3 q_wh
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.