I have been struggling for some time now with this code... I have this vector of unique ID "EID" of length 821 extracted from one of my dataframe (skate). It looks like this:
> head(skate$EID)
[1] "896-19" "895-8" "899-1" "899-5" "899-8" "895-7"
I would like to remove the complete rows in another dataframe (t5) if any of the t5$EID is equal (a duplicate) of skate$EID.
I was able to get my 'duplicated' dataframe in t5 of all my matching EID as follow:
> xx<-skate$EID
> t5[match(xx,t5[,26]), ]#gives me a dataframe of all matching EID in skate$EID
record.t trip set month stratum NAFO unit.area time dur.set distance
8948 5 896 19 11 221 2J N12 908 15 8
8849 5 895 8 10 766 3O R36 1650 16 8
9289 5 899 1 12 743 3L V26 2052 15 8
9299 5 899 5 12 746 3L W27 1129 14 7
Where t5[,26] correspond to t5$EID column. I'm sure it's simple, but I'm not sure how to remove all of these now from my t5 dataframe! Tips would be very much appreciated! Thank you!
There are many ways to do this. To test for elements of vector A not in vector B, you can use a combination of !
, R's logical negation operator (see ?"!"
) and %in%
(see ?%in%
). You then use the results of that test to indicate which rows to keep.
# Create two example data.frames
skate <- data.frame(EID = c("896-19", "895-8", "899-1", "899-5"),
score = 1:4)
t5 <- data.frame(EID = c("896-19", "camel", "899-1", "goat", "899-1"),
score = 105:101)
# Method 1
t5[!t5$EID %in% skate$EID, ]
# Method 2 (using the very handy subset() function)
subset(t5, !EID %in% skate$EID)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.