[英]matrix multiplication performance
Code goes as follows, 代码如下:
In [180]: rng = np.random.RandomState(123)
In [181]: A1 = rng.uniform(size=(10000,80))
In [182]: B1 = rng.uniform(size=(10000,30))
In [183]: A2 = rng.uniform(size=(80,10000))
In [184]: B2 = rng.uniform(size=(30,10000))
In [185]: %timeit np.dot(A1.T, B1)
10 loops, best of 3: 136 ms per loop
In [186]: %timeit np.dot(A2, B2.T)
10 loops, best of 3: 25.1 ms per loop
In [4]: %timeit np.dot(A2, B1)
10 loops, best of 3: 56.3 ms per loop
I want to multiply (A1, B1)
and (A2, B2)
to form a (80,30)
matrix, the difference here is that A1
is defined as the transpose of A2
, with 10000
rows in A1
but 80
row in A2
. 我想乘
(A1, B1)
和(A2, B2)
以形成(80,30)
矩阵,这里的不同之处在于A1
被定义为的转置A2
,具有10000
行A1
,但80
行中A2
。 Same for B1,B2
. 与
B1,B2
相同。
The performance is quite different, I guess it's because the memory layout of numpy.array
is more cache-friendly with large columns than with large rows , right? 性能相当的不同,我想这是因为内存布局
numpy.array
更为缓存友好与大柱比大排 ,对不对? But how? 但是如何?
MCM is an algorithm that computes the most efficient way to multiply a set of matrices, it's worth studying to learn more about matrix multiplication. MCM是一种算法,它计算出最有效的方式来乘法一组矩阵,值得学习更多有关矩阵乘法的知识。 In general matrix multiplication is not commutative and depending on the order you can get radically different run times for computation.
通常,矩阵乘法不是可交换的,根据顺序,您可以得到截然不同的运行时间来进行计算。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.