I came accross this unexpected behaviour in data.table
. Rows with NAs
in a certain column are removed when excluding rows with a certain value as in this example:
library(data.table)
dt_mtcars <- setDT(copy(mtcars))
set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))
dt_mtcars[ na_rows, cyl := NA]
dt_mtcars[ is.na(cyl), .N]
#> [1] 3
dt_mtcars <- dt_mtcars[ cyl != 4]
dt_mtcars[ is.na(cyl), .N]
#> [1] 0
Created on 2022-01-27 by the reprex package (v2.0.1)
Excluding rows instead like
library(data.table)
dt_mtcars <- setDT(copy(mtcars))
set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))
dt_mtcars[ na_rows, cyl := NA]
dt_mtcars[ is.na(cyl), .N]
#> [1] 3
dt_mtcars <- dt_mtcars[ !cyl %in% 4]
dt_mtcars[ is.na(cyl), .N]
#> [1] 3
Created on 2022-01-27 by the reprex package (v2.0.1)
does have the expected result. Am I wrong in expecting this same result in the first example above? Or is this a bug in data.table
?
This isn't a data.table
issue.
In the first case you don't select NA
s:
NA != 4
[1] NA
In the second case you do:
!NA %in% 4
[1] TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.