矩阵乘法性能

Question

Code goes as follows, 代码如下：

In [180]: rng = np.random.RandomState(123)

In [181]: A1 = rng.uniform(size=(10000,80))

In [182]: B1 = rng.uniform(size=(10000,30))

In [183]: A2 = rng.uniform(size=(80,10000))

In [184]: B2 = rng.uniform(size=(30,10000))

In [185]: %timeit np.dot(A1.T, B1)
10 loops, best of 3: 136 ms per loop

In [186]: %timeit np.dot(A2, B2.T)
10 loops, best of 3: 25.1 ms per loop

In [4]: %timeit np.dot(A2, B1)
10 loops, best of 3: 56.3 ms per loop

I want to multiply (A1, B1) and (A2, B2) to form a (80,30) matrix, the difference here is that A1 is defined as the transpose of A2 , with 10000 rows in A1 but 80 row in A2 . 我想乘(A1, B1)和(A2, B2)以形成(80,30)矩阵，这里的不同之处在于A1被定义为的转置A2 ，具有10000行A1 ，但80行中A2 。 Same for B1,B2 . 与B1,B2相同。

The performance is quite different, I guess it's because the memory layout of numpy.array is more cache-friendly with large columns than with large rows , right? 性能相当的不同，我想这是因为内存布局numpy.array更为缓存友好与大柱比大排，对不对？ But how? 但是如何？

Answer 1

MCM is an algorithm that computes the most efficient way to multiply a set of matrices, it's worth studying to learn more about matrix multiplication. MCM是一种算法，它计算出最有效的方式来乘法一组矩阵，值得学习更多有关矩阵乘法的知识。 In general matrix multiplication is not commutative and depending on the order you can get radically different run times for computation. 通常，矩阵乘法不是可交换的，根据顺序，您可以得到截然不同的运行时间来进行计算。

Standard Wiki 标准维基

矩阵乘法性能

问题描述

1 个解决方案

解决方案1
0 2014-07-15 02:27:34

矩阵乘法性能

问题描述

1 个解决方案

解决方案1 0 2014-07-15 02:27:34

解决方案1
0 2014-07-15 02:27:34