[英]Filter dataframe when all columns are NA in `dplyr`
這肯定是一個簡單的問題(如果有人知道答案),但我仍然找不到關於 SO 的指導:我有一個包含很多行的數據框,所有列中只有NA
(在lead
操作之后)。 我想刪除這些行:
df <- structure(list(line = c("0001", NA, "0002", NA, "0003", NA, "0004",
NA, "0005", NA),
speaker = c(NA, NA, "ID16.C-U", NA, NA, NA, "ID16.B-U", NA, NA, NA),
utterance = c("7.060", NA, " ah-ha,", NA, "0.304", NA, " °°yes°°", NA, "7.740", NA),
timestamp = c(NA, "00:00:00.000 - 00:00:07.060", NA, "00:00:07.060 - 00:00:07.660", NA,
"00:00:07.660 - 00:00:07.964", NA, "00:00:07.964 - 00:00:08.610", NA,
"00:00:08.610 - 00:00:16.350")), row.names = c(NA, 10L), class = "data.frame")
但這兩者都不是:
df %>%
mutate(timestamp = lead(timestamp)) %>%
filter(across(everything(), ~!is.na(.)))
這也不行:
df %>%
mutate(timestamp = lead(timestamp)) %>%
rowwise() %>%
filter(c_across(everything(), ~!is.na(.)))
解決辦法是什么?
預期:
line speaker utterance timestamp
1 0001 <NA> 7.060 00:00:00.000 - 00:00:07.060
3 0002 ID16.C-U ah-ha, 00:00:07.060 - 00:00:07.660
5 0003 <NA> 0.304 00:00:07.660 - 00:00:07.964
7 0004 ID16.B-U °°yes°° 00:00:07.964 - 00:00:08.610
9 0005 <NA> 7.740 00:00:08.610 - 00:00:16.350
這會起作用嗎?
df <- df %>% mutate(timestamp = lead(timestamp))
df[rowSums(is.na(df))!=ncol(df),]
偽 tidyverse 版本:
df %>%
dplyr::mutate(timestamp = dplyr::lead(timestamp)) %>%
dplyr::filter(rowSums(is.na(.))!=ncol(.))
dplyr
有新函數if_all()
和if_any()
來處理這樣的情況:
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(timestamp = lead(timestamp)) %>%
filter(!if_all(everything(), is.na))
#> line speaker utterance timestamp
#> 1 0001 <NA> 7.060 00:00:00.000 - 00:00:07.060
#> 2 0002 ID16.C-U ah-ha, 00:00:07.060 - 00:00:07.660
#> 3 0003 <NA> 0.304 00:00:07.660 - 00:00:07.964
#> 4 0004 ID16.B-U °°yes°° 00:00:07.964 - 00:00:08.610
#> 5 0005 <NA> 7.740 00:00:08.610 - 00:00:16.350
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.