简体   繁体   中英

How do I Identify by row id the values in a data frame column not in another data frame column?

How do I identify by row id the values in data frame d2 column c3 that are not in data frame d1 column c1 ? My which function returns all records when sub-setting as shown. My requirement is to follow this sub set structure and not value$field design which works:

c1 <- c("A", "B", "C", "D", "E")
c2 <- c("a", "b", "c", "d", "e")

c3 <- c("A", "z", "C", "z", "E", "F")
c4 <- c("a", "x", "x", "d", "e", "f")

d1 <- data.frame(c1, c2, stringsAsFactors = F)
d2 <- data.frame(c3, c4, stringsAsFactors = F)

x <- unique(d1["c1"])
y <- d2[,"c3"]

id <- which(!(y %in% x) )  # incorrect, all row ids returned

I am trying to find the id's of rows in y where the specified column does not include values of x

I believe setdiff would work here. I see z and F are what you want, right? They are not in d1[,"c1"] but are in d2[,"c3"]

includes <- setdiff(d2[,"c3"], d1[,"c1"])

d2_new <- d2[d2[,"c3"] %in% includes,]

d2_new$id <- rownames(d2_new)
d2_new

# or 

ids <- rownames(d2[d2[,"c3"] %in% includes,])

output

d2_new

#  c3 c4 id
#2  z  x  2
#4  z  d  4
#6  F  f  6

ids
#[1] "2" "4" "6"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM