r data.table excluding rows with certain value in a column removes NAs too

Question

I came accross this unexpected behaviour in data.table . Rows with NAs in a certain column are removed when excluding rows with a certain value as in this example:

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ cyl != 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 0

^{Created on 2022-01-27 by the reprex package (v2.0.1)}

Excluding rows instead like

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ !cyl %in% 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

^{Created on 2022-01-27 by the reprex package (v2.0.1)}

does have the expected result. Am I wrong in expecting this same result in the first example above? Or is this a bug in data.table ?

Answer 1

This isn't a data.table issue.

In the first case you don't select NA s:

NA != 4
[1] NA

In the second case you do:

!NA %in% 4
[1] TRUE

r data.table excluding rows with certain value in a column removes NAs too

Question

1 answers

solution1
4 ACCPTED 2022-01-27 13:00:41

r data.table excluding rows with certain value in a column removes NAs too

Question

1 answers

solution1 4 ACCPTED 2022-01-27 13:00:41

solution1
4 ACCPTED 2022-01-27 13:00:41