[英]set a threshold for complete cases to remove NA from multiple columns in R
There might be an easy answer to this, but I am not able to make it work.对此可能有一个简单的答案,但我无法使其发挥作用。 I have a data table that looks like this:
我有一个如下所示的数据表:
df <- data.table(t = c(1, 2, 3), a = c(NA, NA, 4), b = c(NA, 4, NA), c = c(NA, 4, NA))
How can I remove only the rows where all columns but "t" have NA's.如何仅删除除“t”之外的所有列都具有 NA 的行。 It should be fast because of my big data files, so I would like to do it especially with complete.cases.
由于我的数据文件很大,所以它应该很快,所以我特别想用 complete.cases 来做。 I couldn't find a solution to this problem yet.
我还没有找到解决这个问题的方法。
The result should look like this结果应该是这样的
dfRes <- data.table(t = c(2, 3), a = c(NA, 4), b = c(4, NA), c = c(4, NA))
We can use rowSums
on columns other than "t"
.我们可以在
"t"
以外的列上使用rowSums
。
library(data.table)
cols <- which(names(df) != 't')
df[rowSums(!is.na(df[, ..cols])) > 0, ]
# t a b c
#1: 2 NA 4 4
#2: 3 4 NA NA
We can use complete.cases
with Reduce
我们可以使用
complete.cases
和Reduce
library(data.table)
df[df[, Reduce(`|`, lapply(.SD, complete.cases)), .SDcols = a:c]]
# t a b c
#1: 2 NA 4 4
#2: 3 4 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.