[英]Fastest way to multiply matrix columns with vector elements in R
I have a matrix m
and a vector v
. 我有一个矩阵
m
和一个向量v
。 I would like to multiply first column of matrix m
by the first element of vector v
, and multiply the second column of matrix m
by the second element of vector v
, and so on. 我想的矩阵乘法第一列
m
由向量的第一元素v
,和的矩阵乘法,第二列m
通过向量的第二元素v
,依此类推。 I can do it with the following code, but I am looking for a way which does not require the two transpose calls. 我可以使用以下代码来完成此操作,但是我正在寻找一种不需要两次移调调用的方法。 How can I do this faster in R?
如何在R中更快地执行此操作?
m <- matrix(rnorm(120000), ncol=6)
v <- c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5)
system.time(t(t(m) * v))
# user system elapsed
# 0.02 0.00 0.02
Use some linear algebra and perform matrix multiplication, which is quite fast in R
. 使用一些线性代数并执行矩阵乘法,这在
R
非常快。
eg 例如
m %*% diag(v)
some benchmarking 一些基准
m = matrix(rnorm(1200000), ncol=6)
v=c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5)
library(microbenchmark)
microbenchmark(m %*% diag(v), t(t(m) * v))
## Unit: milliseconds
## expr min lq median uq max neval
## m %*% diag(v) 16.57174 16.78104 16.86427 23.13121 109.9006 100
## t(t(m) * v) 26.21470 26.59049 32.40829 35.38097 122.9351 100
If you have a larger number of columns your t(t(m) * v) solution outperforms the matrix multiplication solution by a wide margin. 如果列数较多,则t(t(m)* v)解的性能将大大优于矩阵乘法解。 However, there is a faster solution, but it comes with a high cost in in memory usage.
但是,有一个更快的解决方案,但是它在内存使用方面付出了高昂的代价。 You create a matrix as large as m using rep() and multiply elementwise.
使用rep()创建一个与m一样大的矩阵,然后逐元素相乘。 Here's the comparison, modifying mnel's example:
这是比较,修改了mnel的示例:
m = matrix(rnorm(1200000), ncol=600)
v = rep(c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5), length = ncol(m))
library(microbenchmark)
microbenchmark(t(t(m) * v),
m %*% diag(v),
m * rep(v, rep.int(nrow(m),length(v))),
m * rep(v, rep(nrow(m),length(v))),
m * rep(v, each = nrow(m)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# t(t(m) * v) 17.682257 18.807218 20.574513 19.239350 19.818331 62.63947 100
# m %*% diag(v) 415.573110 417.835574 421.226179 419.061019 420.601778 465.43276 100
# m * rep(v, rep.int(nrow(m), ncol(m))) 2.597411 2.794915 5.947318 3.276216 3.873842 48.95579 100
# m * rep(v, rep(nrow(m), ncol(m))) 2.601701 2.785839 3.707153 2.918994 3.855361 47.48697 100
# m * rep(v, each = nrow(m)) 21.766636 21.901935 23.791504 22.351227 23.049006 66.68491 100
As you can see, using "each" in rep() sacrifices speed for clarity. 如您所见,在rep()中使用“ each”会牺牲速度以保持清晰度。 The difference between rep.int and rep seems to be neglible, both implementations swap places on repeated runs of microbenchmark().
rep.int和rep之间的区别似乎可以忽略不计,两种实现都在microbenchmark()的重复运行上交换位置。 Keep in mind, that ncol(m) == length(v).
请记住,ncol(m)== length(v)。
As @Arun points out, I don't know that you'll beat your solution in terms of time efficiency. 正如@Arun指出的那样,我不知道您会在时间效率方面击败您的解决方案。 In terms of code understandability, there are other options though:
就代码的易懂性而言,还有其他选择:
One option: 一种选择:
> mapply("*",as.data.frame(m),v)
V1 V2 V3
[1,] 0.0 0.0 0.0
[2,] 1.5 0.0 0.0
[3,] 1.5 3.5 0.0
[4,] 1.5 3.5 4.5
And another: 还有一个:
sapply(1:ncol(m),function(x) m[,x] * v[x] )
For the sake of completeness, I added sweep
to the benchmark. 为了完整起见,我在基准测试中添加了
sweep
。 Despite its somewhat misleading attribute names, I think it may be more readable than other alternatives, and also quite fast: 尽管属性名称有些误导,但我认为它可能比其他替代方法更具可读性,而且速度也很快:
n = 1000
M = matrix(rnorm(2 * n * n), nrow = n)
v = rnorm(2 * n)
microbenchmark::microbenchmark(
M * rep(v, rep.int(nrow(M), length(v))),
sweep(M, MARGIN = 2, STATS = v, FUN = `*`),
t(t(M) * v),
M * rep(v, each = nrow(M)),
M %*% diag(v)
)
Unit: milliseconds
expr min lq mean
M * rep(v, rep.int(nrow(M), length(v))) 5.259957 5.535376 9.994405
sweep(M, MARGIN = 2, STATS = v, FUN = `*`) 16.083039 17.260790 22.724433
t(t(M) * v) 19.547392 20.748929 29.868819
M * rep(v, each = nrow(M)) 34.803229 37.088510 41.518962
M %*% diag(v) 1827.301864 1876.806506 2004.140725
median uq max neval
6.158703 7.606777 66.21271 100
20.479928 23.830074 85.24550 100
24.722213 29.222172 92.25538 100
39.920664 42.659752 106.70252 100
1986.152972 2096.172601 2432.88704 100
As done by bluegrue, a simple rep would suffice as well to perform element-wise multiplication. 正如bluegrue所做的那样,简单的rep也足以执行逐元素的乘法。
The number of multiplications and summations is reduced by a wide-margin as if simple matrix multiplication with diag()
is performed, where for this case a lot of zero-multiplications can be avoided. 乘法和求和的次数减少了一个大范围,就好像执行了
diag()
简单矩阵乘法一样,在这种情况下,可以避免很多零乘法。
m = matrix(rnorm(1200000), ncol=6)
v=c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5)
v2 <- rep(v,each=dim(m)[1])
library(microbenchmark)
microbenchmark(m %*% diag(v), t(t(m) * v), m*v2)
Unit: milliseconds
expr min lq mean median uq max neval cld
m %*% diag(v) 11.269890 13.073995 16.424366 16.470435 17.700803 95.78635 100 b
t(t(m) * v) 9.794000 11.226271 14.018568 12.995839 15.010730 88.90111 100 b
m * v2 2.322188 2.559024 3.777874 3.011185 3.410848 67.26368 100 a
If you're working with sparse matrices, it's likely element-wise methods (ie, expanding v
to be a dense matrix) could exceed memory constraints, so I'm going to exclude those proposed methods from this analysis. 如果您正在使用稀疏矩阵,则可能是逐元素方法(即,将
v
扩展为密集矩阵)可能会超出内存限制,因此在本分析中,我将排除那些建议的方法。
The Matrix package provides a Diagonal
method that tremendously improves the diagonal multiplication approach. Matrix软件包提供了
Diagonal
方法,极大地改进了对角线乘法方法。
library(Matrix)
library(microbenchmark)
N_ROW = 5000
N_COL = 10000
M <- rsparsematrix(N_ROW, N_COL, density=0.1)
v <- rnorm(N_COL)
microbenchmark(
M %*% Diagonal(length(v), v),
t(t(M) * v),
sweep(M, MARGIN=2, STATS=v, FUN='*'))
# Unit: milliseconds
# expr min lq mean median uq max neval
# M %*% Diagonal(length(v), v) 36.46755 39.03379 47.75535 40.41116 43.30241 248.1036 100
# t(t(M) * v) 207.70560 223.35126 269.08112 250.25379 284.49382 511.0403 100
# sweep(M, MARGIN = 2, STATS = v, FUN = "*") 1682.45263 1869.87220 1941.54691 1924.80218 1999.62484 3104.8305 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.