简体   繁体   中英

R data.table behavior while filtering rows

I am creating a data.table in R and setting a column to be used as key. When I try to retrieve values from the data table; for the rows where there is no match I get NA values back. I typically dont want that behavior in my search. Example below

library(data.table) 
dt <- data.table('foo'=seq(10),bar=sample(letters,10))
setkey(dt,bar)
dt[sample(letters,5)]


> dt[sample(letters,5)]
   b foo
1: x   4
2: q   2
3: u   8
4: s  NA
5: b  NA

To remove the NA rows simply set nomatch=0 :

Here is an example (I removed the random sampling so everyone can have the same results)

library(data.table)
dt = data.table(foo = 1:10, bar = letters[1:10])
setkey(dt, bar)
needed_letters = letters[c(1:8,11,12)] #1 - 8 are available, 11 and 12 are not
dt[J(needed_letters),nomatch=0]

Addition from Matt

Also, if you prefer nomatch=0 to be the default, you can change the default :

options(datatable.nomatch=0)
dt[J(needed_letters)]    # now, no NAs will be returned

You can check all arguments like this :

> args(data.table:::`[.data.table`)

function (x, i, j, by, keyby,
    with = TRUE,
    nomatch = getOption("datatable.nomatch"), 
    mult = "all",
    roll = FALSE,
    rollends = if (roll=="nearest") c(TRUE,TRUE)
               else if (roll>=0) c(FALSE, TRUE)
               else c(TRUE,FALSE),
    which = FALSE,
    .SDcols,
    verbose = getOption("datatable.verbose"), 
    allow.cartesian = getOption("datatable.allow.cartesian"), 
    drop = NULL) 

The arguments whose default is via getOption can therefore have their default changed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM