简体   繁体   中英

Check row-wise NA sum in R data.table

Problem: I want to check if a row contains solely of NA's in a data.table object. Currently, I have an implementation depending on apply . Is there a more efficient while readable solution?

Any improvements and ideas are welcome! Thanks

dt <- data.table(
  x = c("A", "B", "C", "D"),
  y = c("true", NA, NA, "true"),
  z = c(NA, NA, "true", "true"),
  a = c(NA, NA, NA, "ha")
)

# Current Code:
idx <- apply(dt[, c(2:ncol(dt)), with = FALSE], 1, function(x) all(is.na(x)))
dt <- dt[!idx] 

# Code Attempt 1 (not so nice due to temp na_count column)
rel_cols <- names(dt)[!names(dt) %in% c("x")]
dt[, na_count := rowSums(is.na(.SD)), .SDcols = rel_cols][na_count < (ncol(dt) - 2)]

You can use rowSums like this :

library(data.table)
dt[rowSums(!is.na(dt[, ..rel_cols])) > 0]

#   x    y    z    a
#1: A true <NA> <NA>
#2: C <NA> true <NA>
#3: D true true   ha

Or using .SDcols :

dt[dt[, rowSums(!is.na(.SD)) > 0, .SDcols = rel_cols]]

Here is one base R option:

library(data.table)

dt[, rowSums(is.na(dt)) == ncol(dt)]

      x    y    z    a
1: <NA> <NA> <NA> <NA>

Data:

dt <- data.table(
    x = c("A", NA, "C", "D"),
    y = c("true", NA, NA, "true"),
    z = c(NA, NA, "true", "true"),
    a = c(NA, NA, NA, "ha")
)

Note: I intentionally slightly altered your sample data to make the second row of the data table all NA values, to demonstrate the answer is working.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM