简体   繁体   中英

How to filter rows in a datatable based on another datatable?

I want to filter a datatable(d1) for all rows sharing the same id as another datatable(d2).

d1 <- data.table(id = c('123','132','142', '153', '143'),
                 x1 = c('a','a','b','e','r'),
                 x2 = c('d','e','f','z','o'))
    
d2 <- data.table(id = c('123','132','142', '143', '165', '153', '143'),
                   y1= c('ABCDE', 'CDEFE', 'EFSAD', 'FEASD', 'SDSDS', 'VDCDS', 'DSXSZ' )

Expected output:

   id    y1
1: 123  ABCDE
2: 132  CDEFE
3: 142  EFSAD
4: 153  VDCDS
5: 143  DSXSZ

I have tried the simple code:

d2[id %in% d1$id]

But R gets stuck when running the code and the stop sign to stop the code does not show up and ESC when in the console does not work. I need to restart R to stop the code.

I know how to do this in dplyr

d2 %>%
 group_by(id) %>%
  filter(lopnr %in% d1$lopnr)

but i need to do it in data.table since I am dealing with large dataframes and it takes too long time in dplyr.

You can use this approach:

library(data.table)

d2[d1, on = .(id), nomatch = NULL][, c("id", "y1")]

Output

    id    y1
1: 123 ABCDE
2: 132 CDEFE
3: 142 EFSAD
4: 153 VDCDS
5: 143 FEASD
6: 143 DSXSZ
newdf <- merge(d2,d1, all = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM