当`dplyr`中的所有列都是NA时过滤数据框

Question

This is surely a simple question (if one knows the answer) but I still couldn't find guidance on SO: I have a dataframe with lots of rows that only have NA across all columns (after a lead operation).这肯定是一个简单的问题（如果有人知道答案），但我仍然找不到关于 SO 的指导：我有一个包含很多行的数据框，所有列中只有NA （在lead操作之后）。 I want to remove those rows:我想删除这些行：

df <- structure(list(line = c("0001", NA, "0002", NA, "0003", NA, "0004", 
                              NA, "0005", NA), 
                     speaker = c(NA, NA, "ID16.C-U", NA, NA, NA, "ID16.B-U", NA, NA, NA), 
                     utterance = c("7.060", NA, "  ah-ha,", NA, "0.304", NA, "  °°yes°°", NA, "7.740", NA), 
                     timestamp = c(NA, "00:00:00.000 - 00:00:07.060", NA, "00:00:07.060 - 00:00:07.660", NA, 
                                   "00:00:07.660 - 00:00:07.964", NA, "00:00:07.964 - 00:00:08.610", NA, 
                                   "00:00:08.610 - 00:00:16.350")), row.names = c(NA, 10L), class = "data.frame")

But neither this:但这两者都不是：

df %>%
  mutate(timestamp = lead(timestamp)) %>%
  filter(across(everything(), ~!is.na(.)))

nor this works:这也不行：

df %>%
  mutate(timestamp = lead(timestamp)) %>%
  rowwise() %>%
  filter(c_across(everything(), ~!is.na(.)))

What's the solution?解决办法是什么？

Expected :预期：

  line  speaker utterance                   timestamp
1 0001     <NA>     7.060 00:00:00.000 - 00:00:07.060
3 0002 ID16.C-U    ah-ha, 00:00:07.060 - 00:00:07.660
5 0003     <NA>     0.304 00:00:07.660 - 00:00:07.964
7 0004 ID16.B-U   °°yes°° 00:00:07.964 - 00:00:08.610
9 0005     <NA>     7.740 00:00:08.610 - 00:00:16.350

Answer 1

Will this work?这会起作用吗？

df <- df %>% mutate(timestamp = lead(timestamp))
df[rowSums(is.na(df))!=ncol(df),]

pseudo-tidyverse version:伪 tidyverse 版本：

df %>% 
  dplyr::mutate(timestamp = dplyr::lead(timestamp)) %>% 
  dplyr::filter(rowSums(is.na(.))!=ncol(.))

Answer 2

dplyr has new functions if_all() and if_any() to handle cases like these: dplyr有新函数if_all()和if_any()来处理这样的情况：

library(dplyr, warn.conflicts = FALSE)

df %>% 
    mutate(timestamp = lead(timestamp)) %>%
    filter(!if_all(everything(), is.na))
#>   line  speaker utterance                   timestamp
#> 1 0001     <NA>     7.060 00:00:00.000 - 00:00:07.060
#> 2 0002 ID16.C-U    ah-ha, 00:00:07.060 - 00:00:07.660
#> 3 0003     <NA>     0.304 00:00:07.660 - 00:00:07.964
#> 4 0004 ID16.B-U   °°yes°° 00:00:07.964 - 00:00:08.610
#> 5 0005     <NA>     7.740 00:00:08.610 - 00:00:16.350

当`dplyr`中的所有列都是NA时过滤数据框

问题描述

2 个解决方案

解决方案1
1 2021-10-26 08:58:51

解决方案2
1 已采纳 2021-10-26 09:04:22

当`dplyr`中的所有列都是NA时过滤数据框

问题描述

2 个解决方案

解决方案1 1 2021-10-26 08:58:51

解决方案2 1 已采纳 2021-10-26 09:04:22

解决方案1
1 2021-10-26 08:58:51

解决方案2
1 已采纳 2021-10-26 09:04:22