简体   繁体   English

矩阵乘法性能

[英]matrix multiplication performance

Code goes as follows, 代码如下:

In [180]: rng = np.random.RandomState(123)

In [181]: A1 = rng.uniform(size=(10000,80))

In [182]: B1 = rng.uniform(size=(10000,30))

In [183]: A2 = rng.uniform(size=(80,10000))

In [184]: B2 = rng.uniform(size=(30,10000))

In [185]: %timeit np.dot(A1.T, B1)
10 loops, best of 3: 136 ms per loop

In [186]: %timeit np.dot(A2, B2.T)
10 loops, best of 3: 25.1 ms per loop

In [4]: %timeit np.dot(A2, B1)
10 loops, best of 3: 56.3 ms per loop

I want to multiply (A1, B1) and (A2, B2) to form a (80,30) matrix, the difference here is that A1 is defined as the transpose of A2 , with 10000 rows in A1 but 80 row in A2 . 我想乘(A1, B1)(A2, B2)以形成(80,30)矩阵,这里的不同之处在于A1被定义为的转置A2 ,具有10000A1 ,但80行中A2 Same for B1,B2 . B1,B2相同。

The performance is quite different, I guess it's because the memory layout of numpy.array is more cache-friendly with large columns than with large rows , right? 性能相当的不同,我想这是因为内存布局numpy.array更为缓存友好大柱大排 ,对不对? But how? 但是如何?

MCM is an algorithm that computes the most efficient way to multiply a set of matrices, it's worth studying to learn more about matrix multiplication. MCM是一种算法,它计算出最有效的方式来乘法一组矩阵,值得学习更多有关矩阵乘法的知识。 In general matrix multiplication is not commutative and depending on the order you can get radically different run times for computation. 通常,矩阵乘法不是可交换的,根据顺序,您可以得到截然不同的运行时间来进行计算。

Standard Wiki 标准维基

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM