简体   繁体   中英

Function faster than apply(…, …, range) in R

I've got a problem with writing around apply(), which works really slow. My assignment is to give the range of the rows of a Matrix without using apply(). I'm trying my best but I still need help...

This is what I got so far:

row.range <- function(X){
  Y <- matrix(0, nrow = nrow(X), ncol = 2)

    for(i in nrow(X)){
    Y[i, 1] <- min(X[i, ])
    Y[i, 2] <- max(X[i, ])
  }
  return(Y)
  print(Y)
}

Where X could be any numeric matrix. Sadly, the output is only 0s, except for the last row where I actually get the right minimum and maximum. Why does this only work for the last row?

For testing I used:

M <- matrix(sample(1:6, size = 100 * 5, replace = TRUE), ncol = 5)
row.range(X)

Any help would be very much appreciated :-)

Using a smaller reproducible example

set.seed(123)
M <- matrix(sample(1:6, size = 10 * 5, replace = TRUE), ncol = 5)

You could try the already fully optimized matrixStats::rowRanges function

matrixStats::rowRanges(M)
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

Or base R vectorized max.col function

cbind(M[cbind(1:nrow(M), max.col(-M))],
      M[cbind(1:nrow(M), max.col(M))])
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

Another semi-vectorized base R approach it to use pmin/pmax combined with do.call (which offers possible NA handling too), but that will require converting your matrix to a data.frame ( not recommended )

DF <- as.data.frame(M)
cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
      do.call(pmax.int, c(na.rm = TRUE, DF)))
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

As R being a vectorized language, by row operations will be often slow, thus either try vectorizing or use a package such as Rcpp in order to write compiled C/C++ loops (as was done in the first case)

In most radical cases, you still have hope to greatly optimize your loop using the compiler package


As regarding to your for loop (as was already mentioned by @PereG) you have a syntax error. Instead of for(i in nrow(X)) this should be for(i in 1:nrow(X)) . Otherwise you are operating only on the last row.

Comparison/Benchmark answers/ideas and original code for information.

Data matrix generated by M <- matrix(sample(1:6, size = 1e6 * 5, replace = TRUE), ncol = 5)

Code:

row.range <- function(X){
  Y <- matrix(0, nrow = nrow(X), ncol = 2)

  for(i in 1:nrow(X)){
    Y[i, 1] <- min(X[i, ])
    Y[i, 2] <- max(X[i, ])
  }
  return(Y)
}

testapply <- function(x) {
  t(apply(M,1,function(x) { c(min(x),max(x))} ))
}

testcbind <- function(x) {
  Min <- x[cbind(1:nrow(x),max.col(-x))]
  Max <- x[cbind(1:nrow(x),max.col(x))]
  return(cbind(Min,Max))
}

testpmin <- function(x) {
  DF <- as.data.frame(x)
  cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
        do.call(pmax.int, c(na.rm = TRUE, DF)))
}

Validation:

> head(testpmin(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4
> head(testcbind(M))
     Min Max
[1,]   1   5
[2,]   2   6
[3,]   1   6
[4,]   3   6
[5,]   1   5
[6,]   1   4
> head(testapply(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4
> head(row.range(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4

Benchmark:

> microbenchmark(row.range(M),testapply(M),testcbind(M),testpmin(M),times=10)
Unit: milliseconds
         expr       min        lq      mean    median        uq       max neval
 row.range(M) 3935.1435 4620.8645 4969.8812 5001.3030 5316.3731 5808.4092    10
 testapply(M) 2819.5916 2912.5050 3272.5916 3168.1254 3735.2308 3932.8697    10
 testcbind(M)  248.3587  256.4928  364.5640  282.8879  496.4234  633.2248    10
  testpmin(M)  163.0500  173.0381  203.5254  188.8449  197.8690  385.3048    10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM