简体   繁体   中英

r data.table excluding rows with certain value in a column removes NAs too

I came accross this unexpected behaviour in data.table . Rows with NAs in a certain column are removed when excluding rows with a certain value as in this example:

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ cyl != 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 0

Created on 2022-01-27 by the reprex package (v2.0.1)

Excluding rows instead like

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ !cyl %in% 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

Created on 2022-01-27 by the reprex package (v2.0.1)

does have the expected result. Am I wrong in expecting this same result in the first example above? Or is this a bug in data.table ?

This isn't a data.table issue.

In the first case you don't select NA s:

NA != 4
[1] NA

In the second case you do:

!NA %in% 4
[1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM