简体   繁体   中英

extract unique rows with a condition in r

I have this kind of data:

x <- matrix(c(2,2,3,3,3,4,4,20,33,2,3,45,6,9,45,454,7,4,6,7,5), nrow = 7, ncol = 3)

In the real dataset, I have a huge matrix with a lot of columns. I want to extract unique rows with respect to the first column(Id) and minimum of the third column. For instance, for this matrix I would expect

y <- matrix(c(2,3,4,20,3,9,45,4,5), nrow = 3, ncol = 3)

I tried a lot of things but I couldn't figure out. Any help is appreciated.

Thanks in advance, Zeray

Here's a version that is more complicated, but somewhat faster that Chase's ddply solution - some 200x faster :-)

 uniqueMin <- function(m, idCol = 1L, minCol = ncol(m)) {
    t(vapply(split(1:nrow(m), m[,idCol]), function(i, x, minCol) x[i, , drop=FALSE][which.min(x[i,minCol]),], m[1,], x=m, minCol=minCol))
 }

And the following test code:

nRows <- 10000
nCols <- 100
ids <- nRows/5
m <- cbind(sample(ids, nRows, T), matrix(runif(nRows*nCols), nRows))
system.time( a<-uniqueMin(m, minCol=3L) ) # 0.07
system.time(ddply(as.data.frame(m), "V1", function(x) x[which.min(x$V3) ,])) # 15.72

You can use package plyr. Convert to a data.frame so you can group on the first column, then use which.min to extract the min row by group:

library(plyr)
ddply(as.data.frame(x), "V1", function(x) x[which.min(x$V3) ,])
  V1 V2 V3
1  2 20 45
2  3  3  4
3  4  9  5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM