在 Data.Table 或 R 中的 DPLYR 中删除所有 NA 的组

Question

dataHAVE = data.frame("student"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
"time"=c(1,2,3,1,2,3,1,2,3,NA,NA,NA,NA,2,3),
"score"=c(7,9,5,NA,NA,NA,NA,3,9,NA,NA,NA,7,NA,5))



dataWANT=data.frame("student"=c(1,1,1,3,3,3,5,5,5),
"time"=c(1,2,3,1,2,3,NA,2,3),
"score"=c(7,9,5,NA,3,9,7,NA,5))

I have a tall dataframe and in that data frame I want to remove student IDS that contain NA for all 'score' or for all 'time'.我有一个高数据框，在该数据框中我想删除包含所有“分数”或所有“时间”的 NA 的学生 IDS。 This is just if it is all NA, if there are some NA then I want to keep all their records...这只是如果全部是 NA，如果有一些 NA 那么我想保留他们所有的记录......

Answer 1

Is this what you want?这是你想要的吗？

library(dplyr)

dataHAVE %>%
    group_by(student) %>%
    filter(!all(is.na(score)))

  student  time score
    <dbl> <dbl> <dbl>
1       1     1     7
2       1     2     9
3       1     3     5
4       3     1    NA
5       3     2     3
6       3     3     9
7       5    NA     7
8       5     2    NA
9       5     3     5

Each student is only kept if not ( ! ) all score values are NA每个student只保留如果不是（ ! ） all score值都是NA

Answer 2

Since nobody suggested one, here is a solution using data.table :由于没有人建议，这里是一个使用data.table的解决方案：

  library(data.table)
  dataHAVE = data.table("student"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
                        "time"=c(1,2,3,1,2,3,1,2,3,NA,NA,NA,NA,2,3),
                        "score"=c(7,9,5,NA,NA,NA,NA,3,9,NA,NA,NA,7,NA,5))

Edit:编辑：

Previous but wrong code:以前但错误的代码：

dataHAVE[, .SD[!(all(is.na(time)) & all(is.na(score)))], by = student]

New and correct code:新的和正确的代码：

dataHAVE[, .SD[!(all(is.na(time)) | all(is.na(score)))], by = student]

Returns:返回：

   student time score
1:       1    1     7
2:       1    2     9
3:       1    3     5
4:       3    1    NA
5:       3    2     3
6:       3    3     9
7:       5   NA     7
8:       5    2    NA
9:       5    3     5

Edit:编辑：

Updatet data.table solution with @Cole s suggestion...使用@Cole 的建议更新data.table解决方案...

Answer 3

Here is a base R solution using subset + ave这是使用subset + ave的基本 R 解决方案

dataWANT <- subset(dataHAVE,!(ave(time,student,FUN = function(v) all(is.na(v))) | ave(score,student,FUN = function(v) all(is.na(v)))))

or或者

dataWANT <- subset(dataHAVE,
                   !Reduce(`|`,Map(function(x) ave(get(x),student,FUN = function(v) all(is.na(v))), c("time","score"))))

Answer 4

Another option:另外一个选项：

library(data.table)
setDT(dataHAVE, key="student")
dataHAVE[!student %in% dataHAVE[, if(any(colSums(is.na(.SD))==.N)) student, student]$V1]

Answer 5

Create a dummy variable, and filter based on that创建一个虚拟变量，并根据它进行过滤

library("dplyr")

dataHAVE = data.frame("student"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
                      "time"=c(1,2,3,1,2,3,1,2,3,NA,NA,NA,NA,2,3),
                      "score"=c(7,9,5,NA,NA,NA,NA,3,9,NA,NA,NA,7,NA,5))

dataHAVE %>% 
  mutate(check=is.na(time)&is.na(score)) %>% 
  filter(check == FALSE) %>% 
  select(-check)
#>    student time score
#> 1        1    1     7
#> 2        1    2     9
#> 3        1    3     5
#> 4        2    1    NA
#> 5        2    2    NA
#> 6        2    3    NA
#> 7        3    1    NA
#> 8        3    2     3
#> 9        3    3     9
#> 10       5   NA     7
#> 11       5    2    NA
#> 12       5    3     5

^{Created on 2020-02-21 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2020 年 2 月 21 日创建}

Answer 6

data.table solution generalising to any number of columns: data.table解决方案推广到任意数量的列：

dataHAVE[, 
         .SD[do.call("+", lapply(.SD, function(x) any(!is.na(x)))) == ncol(.SD)], 
         by = student]

#    student time score
# 1:       1    1     7
# 2:       1    2     9
# 3:       1    3     5
# 4:       3    1    NA
# 5:       3    2     3
# 6:       3    3     9
# 7:       5   NA     7
# 8:       5    2    NA
# 9:       5    3     5

在 Data.Table 或 R 中的 DPLYR 中删除所有 NA 的组

问题描述

6 个解决方案

解决方案1
2 已采纳 2020-02-21 12:03:16

解决方案2
2 2020-02-21 12:06:26

Edit:编辑：

Edit:编辑：

解决方案3
1 2020-02-21 12:02:06

解决方案4
1 2020-02-21 22:30:46

解决方案5
0 2020-02-21 12:06:02

解决方案6
0 2020-02-21 13:48:24

在 Data.Table 或 R 中的 DPLYR 中删除所有 NA 的组

问题描述

6 个解决方案

解决方案1 2 已采纳 2020-02-21 12:03:16

解决方案2 2 2020-02-21 12:06:26

Edit:编辑：

Edit:编辑：

解决方案3 1 2020-02-21 12:02:06

解决方案4 1 2020-02-21 22:30:46

解决方案5 0 2020-02-21 12:06:02

解决方案6 0 2020-02-21 13:48:24

解决方案1
2 已采纳 2020-02-21 12:03:16

解决方案2
2 2020-02-21 12:06:26

解决方案3
1 2020-02-21 12:02:06

解决方案4
1 2020-02-21 22:30:46

解决方案5
0 2020-02-21 12:06:02

解决方案6
0 2020-02-21 13:48:24