I've got a problem with writing around apply(), which works really slow. My assignment is to give the range of the rows of a Matrix without using apply(). I'm trying my best but I still need help...
This is what I got so far:
row.range <- function(X){
Y <- matrix(0, nrow = nrow(X), ncol = 2)
for(i in nrow(X)){
Y[i, 1] <- min(X[i, ])
Y[i, 2] <- max(X[i, ])
}
return(Y)
print(Y)
}
Where X could be any numeric matrix. Sadly, the output is only 0s, except for the last row where I actually get the right minimum and maximum. Why does this only work for the last row?
For testing I used:
M <- matrix(sample(1:6, size = 100 * 5, replace = TRUE), ncol = 5)
row.range(X)
Any help would be very much appreciated :-)
Using a smaller reproducible example
set.seed(123)
M <- matrix(sample(1:6, size = 10 * 5, replace = TRUE), ncol = 5)
You could try the already fully optimized matrixStats::rowRanges
function
matrixStats::rowRanges(M)
# [,1] [,2]
# [1,] 1 6
# [2,] 3 6
# [3,] 3 5
# [4,] 3 6
# [5,] 1 6
# [6,] 1 6
# [7,] 2 5
# [8,] 1 6
# [9,] 2 4
# [10,] 1 6
Or base R vectorized max.col
function
cbind(M[cbind(1:nrow(M), max.col(-M))],
M[cbind(1:nrow(M), max.col(M))])
# [,1] [,2]
# [1,] 1 6
# [2,] 3 6
# [3,] 3 5
# [4,] 3 6
# [5,] 1 6
# [6,] 1 6
# [7,] 2 5
# [8,] 1 6
# [9,] 2 4
# [10,] 1 6
Another semi-vectorized base R approach it to use pmin/pmax
combined with do.call
(which offers possible NA
handling too), but that will require converting your matrix to a data.frame
( not recommended )
DF <- as.data.frame(M)
cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
do.call(pmax.int, c(na.rm = TRUE, DF)))
# [,1] [,2]
# [1,] 1 6
# [2,] 3 6
# [3,] 3 5
# [4,] 3 6
# [5,] 1 6
# [6,] 1 6
# [7,] 2 5
# [8,] 1 6
# [9,] 2 4
# [10,] 1 6
As R being a vectorized language, by row operations will be often slow, thus either try vectorizing or use a package such as Rcpp in order to write compiled C/C++ loops (as was done in the first case)
In most radical cases, you still have hope to greatly optimize your loop using the compiler package
As regarding to your for
loop (as was already mentioned by @PereG) you have a syntax error. Instead of for(i in nrow(X))
this should be for(i in 1:nrow(X))
. Otherwise you are operating only on the last row.
Comparison/Benchmark answers/ideas and original code for information.
Data matrix generated by M <- matrix(sample(1:6, size = 1e6 * 5, replace = TRUE), ncol = 5)
Code:
row.range <- function(X){
Y <- matrix(0, nrow = nrow(X), ncol = 2)
for(i in 1:nrow(X)){
Y[i, 1] <- min(X[i, ])
Y[i, 2] <- max(X[i, ])
}
return(Y)
}
testapply <- function(x) {
t(apply(M,1,function(x) { c(min(x),max(x))} ))
}
testcbind <- function(x) {
Min <- x[cbind(1:nrow(x),max.col(-x))]
Max <- x[cbind(1:nrow(x),max.col(x))]
return(cbind(Min,Max))
}
testpmin <- function(x) {
DF <- as.data.frame(x)
cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
do.call(pmax.int, c(na.rm = TRUE, DF)))
}
Validation:
> head(testpmin(M))
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 1 6
[4,] 3 6
[5,] 1 5
[6,] 1 4
> head(testcbind(M))
Min Max
[1,] 1 5
[2,] 2 6
[3,] 1 6
[4,] 3 6
[5,] 1 5
[6,] 1 4
> head(testapply(M))
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 1 6
[4,] 3 6
[5,] 1 5
[6,] 1 4
> head(row.range(M))
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 1 6
[4,] 3 6
[5,] 1 5
[6,] 1 4
Benchmark:
> microbenchmark(row.range(M),testapply(M),testcbind(M),testpmin(M),times=10)
Unit: milliseconds
expr min lq mean median uq max neval
row.range(M) 3935.1435 4620.8645 4969.8812 5001.3030 5316.3731 5808.4092 10
testapply(M) 2819.5916 2912.5050 3272.5916 3168.1254 3735.2308 3932.8697 10
testcbind(M) 248.3587 256.4928 364.5640 282.8879 496.4234 633.2248 10
testpmin(M) 163.0500 173.0381 203.5254 188.8449 197.8690 385.3048 10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.