简体   繁体   中英

Faster way of filling a matrix in R

I want to fill a matrix in R but every column must have an iterative downward shift of vector. So in a sense it will be a lower triangular matrix. My effort is this:


x = c(3,4,8,9)
E <- matrix(0,length(x),length(x));E
for (i in 1:nrow(E)){
  E[i,1]=x[i]
}
E
for (i in 2:nrow(E)){
  for (j in 2:ncol(E)) {
    E[i,2] =x[i-1] }  }
E
for (i in 3:nrow(E)){
  for (j in 3:ncol(E)) {
    E[i,3] =x[i-2] }  }
E
for (i in 4:nrow(E)){
  for (j in 4:ncol(E)) {
    E[i,4] =x[i-3] }  }
E

Each time a remove an element from the vector.But is there aa faster way to do it with less for loops and for n length of the vector instead of 4, for as a genearalization?

Sorry, I couldn't resist. Here's another base approach:

x <- c(3,4,8,9)
n <- length(x)
E <- diag(rep(x[1], n))
j <- unlist(sapply(length(x):2, function(i) x[2:i]))
E[lower.tri(E)] <- j

Added to Rui's benchmark code we get this:

基准

I think it would be interesting if you add this code to the benchmarking

TIC <- function(x) {
  E <- diag(x)
  E[lower.tri(E, TRUE)] <- x[sequence(rev(seq_along(x)))]
  E
}

which gives

> TIC(x)
     [,1] [,2] [,3] [,4]
[1,]    3    0    0    0
[2,]    4    3    0    0
[3,]    8    4    3    0
[4,]    9    8    4    3

and

在此处输入图像描述

Here is a base R way.

E <- diag(length(x))
apply(lower.tri(E, diag = TRUE), 2, function(i) {
  c(rep(0, nrow(E) - sum(i)), x)[seq_along(x)]
})
#     [,1] [,2] [,3] [,4]
#[1,]    3    0    0    0
#[2,]    4    3    0    0
#[3,]    8    4    3    0
#[4,]    9    8    4    3

Performance tests

If the question is about faster code, here are benchmarks.


The functions are mine and Ben Bolker's code.

Rui <- function(x){
  E <- diag(length(x))
  inx <- seq_along(x)
  apply(lower.tri(E, diag = TRUE), 2, function(i) {
    c(rep(0, nrow(E) - sum(i)), x)[inx]
  })
}

Ben <- function(x){
  E <- matrix(0, nrow=length(x), ncol=length(x))
  diag(E) <- x[1]
  for (i in 2:length(x)) {
    E[row(E)==col(E)+i-1] <- x[i]
  }
  E
}

Tests with increasing vector size and plot with ggplot .

library(microbenchmark)
library(ggplot2)

test_speed <- function(n){
  out <- lapply(1:n, function(i){
    x <- sample(10*i)
    mb <- microbenchmark(
      Rui = Rui(x),
      Ben = Ben(x)
    )
    mb <- aggregate(time ~ expr, mb, median)
    mb$size <- 10*i
    mb
  })
  out <- do.call(rbind, out)
  out
}

res <- test_speed(10)

ggplot(res, aes(size, time, color = expr)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(trans = "log10")

在此处输入图像描述

This isn't super -efficient but better than your solution. (The inefficiency is that we are constructing the row() / col() matrices and generating a full logical matrix each time, rather than doing something with indexing.) On the other hand, it seems to be almost instantaneous for length(x)==100 (kind of slow when we go to 1000 though).

E <- matrix(0, nrow=length(x), ncol=length(x))
diag(E) <- x[1]
for (i in 2:length(x)) {
   E[row(E)==col(E)+i-1] <- x[i]
}

It's possible that someone has written more efficient code (in Rcpp?) for indexing sub-diagonals/off-diagonal elements of a matrix.

Despite its slowness, the advantage of this one (IMO) is that it's a little easier to understand; you can also adjust it to a lot of different patterns by coming up with different conditions on the relationship between rows and columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM