简体   繁体   中英

Fastest way to sort and desort rows of a matrix [r]

I have a matrix filled with somewhat random elements. I need every row sorted in decreasing order, then a function is called on the matrix, and finally the resulting matrix needs to be unsorted to the original order.

This is quickly accomplished vector-wise as shown here , but what's the fastest way to do this to every row in a matrix? Right now I'm doing:

# Example matrix
m <- matrix(runif(100), nrow = 25, ncol = 4)

# Get the initial order by row
om <- t(apply(m, 1, order, decreasing = T))

sm <- m
for (i in seq_len(nrow(m))) {
  sm[i, ] <- sm[i, om[i, ]]
}

# ** Operations performed on sm **

# Then unsort
for (i in seq_len(nrow(m))) {
  sm[i, ] <- sm[i, order(om[i, ])]
}

# sm is now sorted by-row in the same order as m

Is there some way given om in the above to sort and unsort while avoiding the for loop or an apply function (both of which make this operation very slow for big m). Thanks!

Edit: There are pointers here: Fastest way to sort each row of a large matrix in R The operation is done inside a function that is already called using parallel, so this operation must be done using serial code.

Row-wise sorting seems to be straightforward. To get the original order back (un-sort) we need the row-wise ranks rather than their order. Thereafter, what works for column sorting in @Josh O'Brien's answer we can adapt for rows.

Base R solution:

rr <- t(apply(m, 1, rank))  ## get initial RANKS by row
sm <- t(apply(m, 1, sort))  ## sort m

##  DOING STUFF HERE  ##

sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]  ## un-sort
all(m == sm)  ## check
# [1] TRUE

Seems to work.

In your linked answer, the rowSort function of the Rfast package stands out well in terms of performance, which may cover the sorting issue. Moreover there's also a rowRanks function that will cover our ranking issue. So we can avoid apply .

Let's try it out.

m[1:3, ]
#           [,1]      [,2]      [,3]        [,4]
# [1,] 0.9148060 0.5142118 0.3334272 0.719355838
# [2,] 0.9370754 0.3902035 0.3467482 0.007884739
# [3,] 0.2861395 0.9057381 0.3984854 0.375489965

library(Rfast)
rr <- rowRanks(m)  ## get initial RANKS by row
sm <- rowSort(m)   ## sort m
sm[1:3, ]  # check
#            [,1]      [,2]      [,3]      [,4]
# [1,] 0.36106962 0.4112159 0.6262453 0.6311956
# [2,] 0.01405302 0.2171577 0.5459867 0.6836634
# [3,] 0.07196981 0.2165673 0.5739766 0.6737271

##  DOING STUFF HERE  ##

sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]  ## un-sort
all(sm == m)  ## check
# [1] TRUE

Dito.

Benchmark

m.test <- matrix(runif(4e6), ncol = 4)
dim(m.test)
# [1] 1000000       4

# Unit: milliseconds
#     expr        min       lq       mean     median         uq        max neval cld
#    Rfast   897.6286   910.91   956.6259   924.1914   986.1246   1048.058     3 a  
#    baseR 87931.2824 88004.73 95659.8671 88078.1737 99524.1594 110970.145     3   c
#  forloop 58927.7784 59434.54 60317.3903 59941.2930 61012.1963  62083.100     3  b 

Not so bad!!


Data/Code:

set.seed(42)

m <- matrix(runif(100), nrow = 25, ncol = 4)

## benchmark
m.test <- matrix(runif(4e6), ncol = 4)

microbenchmark::microbenchmark(
  Rfast={
    rr <- rowRanks(m.test)
    sm <- rowSort(m.test)
    sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]},
  baseR={
    rr <- t(apply(m.test, 1, rank))
    sm <- t(apply(m.test, 1, sort))
    sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]
  },
  forloop={
    om <- t(apply(m.test, 1, order, decreasing = T))
    sm <- m.test
    for (i in seq_len(nrow(m.test))) {
      sm[i, ] <- sm[i, om[i, ]]
    }
    for (i in seq_len(nrow(m.test))) {
      sm[i, ] <- sm[i, order(om[i, ])]
    }
  }, times=3L
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM