简体   繁体   中英

trying to subset a data table in R by removing items that are in a 2nd table

I have two data frames (from a csv file) in R as such:

df1 <- data.frame(V1 = 1:9, V2 = LETTERS[1:9])
df2 <- data.frame(V1 = 1:3, V2 = LETTERS[1:3])

I convert both to data.table as follows:

dt1 <- data.table(df1, key="V1")
dt2 <- data.table(df2, key="V1")

I want to now return a table that looks like dt1 but without any rows where the key is found in dt2 . So in this instance I would like to get back:

4 D
5 E
... 
9 I

I'm using the following code in R:

dt3 <- dt1[!dt2$V1]

this works on this example, however when I try it for a large data set (100k) it does not work. It only removes 2 rows, and I know it should be a lot more than that. is there a limit to this type of operation or something else I havent considered?

Because the tables are keyed, you can do this with a "not-join"

dt1 <- data.table(rep(1:3,2), LETTERS[1:6], key="V1")
#    V1 V2
# 1:  1  A
# 2:  1  D
# 3:  2  B
# 4:  2  E
# 5:  3  C
# 6:  3  F
dt2 <- data.table(1:2, letters[1:2], key="V1")
#    V1 V2
# 1:  1  a
# 2:  2  b

dt1[!.(dt2$V1)]
#    V1 V2
# 1:  3  C
# 2:  3  F

According to the documentation , . or J should not be necessary, since the ! alone is enough:

All types of i may be prefixed with ! . This signals a not-join or not-select should be performed.

However, the OP's code does not work:

dt1[!(dt2$V1)]
#    V1 V2
# 1:  2  B
# 2:  2  E
# 3:  3  C
# 4:  3  F

In this case, dt2$V1 is read as a vector of row numbers, not as part of a join. Looks like this is what is meant by a "not-select", but I think it could be more explicit. When I read the sentence above, for all I know "not-select" and "not-join" are two terms for the same thing.

Drop the column name "V1" to do a not-join. The tables are already keyed by V1.

dt3 <- dt1[!dt2]

You could try:

dt1[!(dt1$V1 %in% dt2$V1)]

This assumes that you don't care about ordering.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM