简体   繁体   English

为完整案例设置阈值以从 R 中的多个列中删除 NA

[英]set a threshold for complete cases to remove NA from multiple columns in R

There might be an easy answer to this, but I am not able to make it work.对此可能有一个简单的答案,但我无法使其发挥作用。 I have a data table that looks like this:我有一个如下所示的数据表:

df <- data.table(t = c(1, 2, 3), a = c(NA, NA, 4), b = c(NA, 4, NA), c = c(NA, 4, NA))

How can I remove only the rows where all columns but "t" have NA's.如何仅删除除“t”之外的所有列都具有 NA 的行。 It should be fast because of my big data files, so I would like to do it especially with complete.cases.由于我的数据文件很大,所以它应该很快,所以我特别想用 complete.cases 来做。 I couldn't find a solution to this problem yet.我还没有找到解决这个问题的方法。

The result should look like this结果应该是这样的

dfRes <- data.table(t = c(2, 3), a = c(NA, 4), b = c(4, NA), c = c(4, NA))

We can use rowSums on columns other than "t" .我们可以在"t"以外的列上使用rowSums

library(data.table)

cols <- which(names(df) != 't')
df[rowSums(!is.na(df[, ..cols])) > 0, ]

#   t  a  b  c
#1: 2 NA  4  4
#2: 3  4 NA NA

We can use complete.cases with Reduce我们可以使用complete.casesReduce

library(data.table)
df[df[, Reduce(`|`, lapply(.SD, complete.cases)), .SDcols = a:c]]
#   t  a  b  c
#1: 2 NA  4  4
#2: 3  4 NA NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM