将数字相乘作为矩阵

Question

Can someone tell me what is the best way to multiply a series of numbers as a matrix? 谁能告诉我将一系列数字相乘的最佳方法是什么？

I mean. 我的意思是。

I have seen algorithms for matrix multiplication, but are to multiply numbers as matrix1 [4] [4] and matrix2 [4] [4]. 我见过矩阵乘法的算法，但是将数字相乘为matrix1 [4] [4]和matrix2 [4] [4]。 However, I want to multiply numbers as matrix1 [16] and matrix2 [16]. 但是，我想将数字乘以矩阵1 [16]和矩阵2 [16]。

Is there any algorithm for this multiplication as fast as possible using float numbers? 是否有任何使用浮点数的乘法运算速度最快的算法？

Thank you very much for your help. 非常感谢您的帮助。

EDIT 编辑

I have used cBLAS and done some speed tests and I was surprised with the results. 我使用了cBLAS并进行了一些速度测试，结果令我感到惊讶。

#include <stdio.h>
#include <stdlib.h>
#include <cblas.h>
#include  <GL/glfw.h>

    void matriz_matriz(float *matriz1,float *matriz2,float *matrizr){
      matrizr[0]  = (matriz1[0]*matriz2[0])+(matriz1[4]*matriz2[1])  +(matriz1[8]*matriz2[2])  +(matriz1[12]*matriz2[3]);
      matrizr[1]  = (matriz1[1]*matriz2[0])+(matriz1[5]*matriz2[1])  +(matriz1[9]*matriz2[2])  +(matriz1[13]*matriz2[3]);
      matrizr[2]  = (matriz1[2]*matriz2[0])+(matriz1[6]*matriz2[1])  +(matriz1[10]*matriz2[2]) +(matriz1[14]*matriz2[3]);
      matrizr[3]  = (matriz1[3]*matriz2[0])+(matriz1[7]*matriz2[1])  +(matriz1[11]*matriz2[2]) +(matriz1[15]*matriz2[3]);

      matrizr[4]  = (matriz1[0]*matriz2[4])+(matriz1[4]*matriz2[5])  +(matriz1[8]*matriz2[6])  +(matriz1[12]*matriz2[7]);
      matrizr[5]  = (matriz1[1]*matriz2[4])+(matriz1[5]*matriz2[5])  +(matriz1[9]*matriz2[6])  +(matriz1[13]*matriz2[7]);
      matrizr[6]  = (matriz1[2]*matriz2[4])+(matriz1[6]*matriz2[5])  +(matriz1[10]*matriz2[6]) +(matriz1[14]*matriz2[7]);
      matrizr[7]  = (matriz1[3]*matriz2[4])+(matriz1[7]*matriz2[5])  +(matriz1[11]*matriz2[6]) +(matriz1[15]*matriz2[7]);

      matrizr[8]  = (matriz1[0]*matriz2[8])+(matriz1[4]*matriz2[9])  +(matriz1[8]*matriz2[10]) +(matriz1[12]*matriz2[11]);
      matrizr[9]  = (matriz1[1]*matriz2[8])+(matriz1[5]*matriz2[9])  +(matriz1[9]*matriz2[10]) +(matriz1[13]*matriz2[11]);
      matrizr[10] = (matriz1[2]*matriz2[8])+(matriz1[6]*matriz2[9])  +(matriz1[10]*matriz2[10])+(matriz1[14]*matriz2[11]);
      matrizr[11] = (matriz1[3]*matriz2[8])+(matriz1[7]*matriz2[9])  +(matriz1[11]*matriz2[10])+(matriz1[15]*matriz2[11]);

      matrizr[12] = (matriz1[0]*matriz2[12])+(matriz1[4]*matriz2[13])+(matriz1[8]*matriz2[14]) +(matriz1[12]*matriz2[15]);
      matrizr[13] = (matriz1[1]*matriz2[12])+(matriz1[5]*matriz2[13])+(matriz1[9]*matriz2[14]) +(matriz1[13]*matriz2[15]);
      matrizr[14] = (matriz1[2]*matriz2[12])+(matriz1[6]*matriz2[13])+(matriz1[10]*matriz2[14])+(matriz1[14]*matriz2[15]);
      matrizr[15] = (matriz1[3]*matriz2[12])+(matriz1[7]*matriz2[13])+(matriz1[11]*matriz2[14])+(matriz1[15]*matriz2[15]);
    }


    int main(){
      int i;
      double tiempo1;
      double tiempo2;

      glfwInit();

      float *mat0 = NULL;
      float *mat1 = NULL;
      float *mat2 = NULL;

      mat0  = (float *)malloc(16 * sizeof(float));
      mat1  = (float *)malloc(16 * sizeof(float));
      mat2  = (float *)malloc(16 * sizeof(float));

      mat0[0]  =  1.0;
      mat0[1]  =  0.0;
      mat0[2]  =  0.0;
      mat0[3]  =  0.0;
      mat0[4]  =  0.0;
      mat0[5]  =  1.0;
      mat0[6]  =  0.0;
      mat0[7]  =  0.0;
      mat0[8]  =  0.0;
      mat0[9]  =  0.0;
      mat0[10] =  1.0;
      mat0[11] =  0.0;
      mat0[12] =  3.281897;
      mat0[13] =  4.714289;
      mat0[14] =  5.124306;
      mat0[15] =  1.0;

      mat1[0]  =  1.0;
      mat1[1]  =  0.0;
      mat1[2]  =  0.0;
      mat1[3]  =  0.0;
      mat1[4]  =  0.0;
      mat1[5]  =  0.924752;
      mat1[6]  =  0.380570;
      mat1[7]  =  0.0;
      mat1[8]  =  0.0;
      mat1[9]  = -0.380570;
      mat1[10] =  0.924752;
      mat1[11] =  0.0;
      mat1[12] =  0.0;
      mat1[13] =  0.0;
      mat1[14] =  0.0;
      mat1[15] =  1.0;

      mat2[0]  =  1.0;
      mat2[1]  =  0.0;
      mat2[2]  =  0.0;
      mat2[3]  =  0.0;
      mat2[4]  =  0.0;
      mat2[5]  =  1.0;
      mat2[6]  =  0.0;
      mat2[7]  =  0.0;
      mat2[8]  =  0.0;
      mat2[9]  =  0.0;
      mat2[10] =  1.0;
      mat2[11] =  0.0;
      mat2[12] =  0.0;
      mat2[13] =  0.0;
      mat2[14] =  0.0;
      mat2[15] =  1.0;

       tiempo1 = glfwGetTime();

       for(i=0;i<100000;i++){
        matriz_matriz(mat0,mat1,mat2);
        //cblas_sgemm(CblasRowMajor,CblasNoTrans,CblasNoTrans,4,4,4,1.0f,mat0,4,mat1,4,0.0f,mat2,4);
       }

      tiempo2 = glfwGetTime();
      printf("Tiempo total: %f\n",tiempo2-tiempo1);

      for(i=0;i<16;i++)printf("valor[%i]: %f\n",i,mat2[i]);

      free(mat0);
      free(mat1);
      free(mat2);

      system("pause");

      glfwTerminate();
      return 0;
    }

If I use the function cblas_sgemm (...) tiemp2 - tiempo1 variable returns the value of 0.096924, but if I use my own function (matriz_matriz(...)) tiempo2 - tiempo1 returns the value of 0.046271...What happens? 如果我使用函数cblas_sgemm（...）tiemp2-tiempo1变量返回值0.096924，但是如果我使用自己的函数（matriz_matriz（...））tiempo2-tiempo1返回值0.046271 ...会发生什么？ My function is faster than Cblas... 我的功能比Cblas快...

This test was tested on a PC with Pentium 3 processor. 该测试已在装有奔腾3处理器的PC上进行了测试。 Can anyone tell me what happens? 谁能告诉我会发生什么？

Thank you very much. 非常感谢你。

Answer 1

Honestly, if you're doing any kind of linear algebra, by far your best bet is to use libraries designed for the purpose, such as BLAS , LAPACK , etc. You will have a very hard time approaching their speed with your own code. 老实说，如果您正在做任何线性代数，那么最好的选择是使用为此目的而设计的库，例如BLAS ， LAPACK等。您将很难用自己的代码来接近它们的速度。

Matrix-matrix operations are BLAS Level 3, and the particular one you want is SGEMM() for float s and DGEMM() for double s. 矩阵，矩阵操作BLAS 3级，并且您需要的技术之一是SGEMM()为float S和DGEMM()为double秒。 The fastest BLAS implementation on Intel hardware are OpenBLAS (derived from GotoBLAS) and the BLAS implementation in Intel's MKL (math kernel library). 在Intel硬件上最快的BLAS实现是OpenBLAS （源自GotoBLAS）和在Intel的MKL （数学内核库）中的BLAS实现。 ATLAS is also very fast if you compile it yourself. 如果您自己编译， ATLAS也非常快。

Answer 2

A version for 2 x 2 matrix (based on link ): 2 x 2矩阵的版本（基于link ）：

#include<iostream>
using namespace std;

int main()
{
    const int rows = 2;
    const int cols = 2;

    float a[4]={1,2,3,4};
    float b[4]={1,2,3,4};
    float c[4]={0,0,0,0};

    for (int i = 0; i <rows; i++) {
        for (int j = 0; j <cols; j++) 
        {   
            float sum = 0.0;
            for (int k = 0; k < rows; k++)
                sum = sum + a[i * cols + k] * b[k * cols + j]; 
            c[i * cols + j] = sum;
        }   
    }   
    for (int ix =0; ix <4; ++ix)
            cout << c[ix] << ' ';

}

将数字相乘作为矩阵

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-08-29 18:02:36

解决方案2
0 2013-08-29 18:10:06

将数字相乘作为矩阵

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-08-29 18:02:36

解决方案2 0 2013-08-29 18:10:06

解决方案1
3 已采纳 2013-08-29 18:02:36

解决方案2
0 2013-08-29 18:10:06