缓存友好方法乘以两个矩阵

Question

I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses) 我打算使用缓存友好的方法将2个矩阵相乘（这将导致更少的未命中）

I found out that this can be done with a cache friendly transpose function. 我发现这可以通过缓存友好的转置函数来完成。

But I am not able to find this algorithm. 但我无法找到这个算法。 Can I know how to achieve this? 我可以知道如何实现这一目标吗？

Answer 1

The word you are looking for is thrashing . 你正在寻找的是捶打。 Searching for thrashing matrix multiplication in Google yields more results . 在Google中搜索颠簸矩阵乘法会产生更多结果。

A standard multiplication algorithm for c = a*b would look like c = a * b的标准乘法算法看起来像

void multiply(double[,] a, double[,] b, double[,] c)
{
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            for (int k = 0; k < n; k++)
                C[i, j] += a[i, k] * b[k, j]; 
}

Basically, navigating the memory fastly in large steps is detrimental to performance. 基本上，以大步骤快速导航存储器对性能是不利的。 The access pattern for k in B[ k , j] is doing exactly that. B [ k ，j]中k的访问模式正是这样做的。 So instead of jumping around in the memory, we may rearrange the operations such that the most inner loops operate only on the second access index of the matrices: 因此，我们可以重新排列操作，使得大多数内部循环仅在矩阵的第二个访问索引上运行，而不是在内存中跳转：

void multiply(double[,] a, double[,] B, double[,] c)
{  
   for (i = 0; i < n; i++)
   {  
      double t = a[i, 0];
      for (int j = 0; j < n; j++)
         c[i, j] = t * b[0, j];

      for (int k = 1; k < n; k++)
      {
         double s = 0;
         for (int j = 0; j < n; j++ )
            s += a[i, k] * b[k, j];
         c[i, j] = s;
      }
   }
}

This was the example given on that page. 这是该页面上给出的示例。 However, another option is to copy the contents of B[k, *] into an array beforehand and use this array in the inner loop calculations. 但是，另一个选择是预先将B [k，*]的内容复制到数组中，并在内部循环计算中使用此数组。 This approach is usually much faster than the alternatives, even if it involves copying data around. 这种方法通常比替代方法快得多 ，即使它涉及复制数据。 Even if this might seem counter-intuitive, please feel free to try for yourself. 即使这看似违反直觉，请随意尝试。

void multiply(double[,] a, double[,] b, double[,] c)
{
    double[] Bcolj = new double[n];
    for (int j = 0; j < n; j++)
    {
        for (int k = 0; k < n; k++)
            Bcolj[k] = b[k, j];

        for (int i = 0; i < n; i++)
        {
            double s = 0;
            for (int k = 0; k < n; k++)
                s += a[i,k] * Bcolj[k];
            c[j, i] = s;
        }
   }
}

Answer 2

@Cesar's answer is not correct. @Cesar的回答不正确。 For example, the inner loop 例如，内循环

for (int k = 0; k < n; k++)
   s += a[i,k] * Bcolj[k];

goes through the i-th column of a. 通过a的第i列。

The following code should ensure we always visit data row by row. 以下代码应确保我们始终逐行访问数据。

void multiply(const double (&a)[I][K], 
              const double (&b)[K][J], 
              double (&c)[I][J]) 
{
    for (int j=0; j<J; ++j) {
       // iterates the j-th row of c
       for (int i=0; i<I; ++i) {
         c[i][j] = 0;
       } 

       // iterates the j-th row of b
       for (int k=0; k<K; ++k) {
          double t = b[k][j];
          // iterates the j-th row of c
          // iterates the k-th row of a
          for (int i=0; i<I; ++i) {
            c[i][j] += a[i][k] * t;
          } 
       }
    }
}

缓存友好方法乘以两个矩阵

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-01-26 17:56:34

解决方案2
1 2015-10-21 23:41:09

缓存友好方法乘以两个矩阵

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-01-26 17:56:34

解决方案2 1 2015-10-21 23:41:09

解决方案1
4 已采纳 2013-01-26 17:56:34

解决方案2
1 2015-10-21 23:41:09