简体   繁体   English

为什么MATLAB / Octave在特征值问题中用C ++擦拭地板?

[英]Why does MATLAB/Octave wipe the floor with C++ in Eigenvalue Problems?

I'm hoping that the answer to the question in the title is that I'm doing something stupid! 我希望标题中的问题的答案是我做了一些愚蠢的事情!

Here is the problem. 这是问题所在。 I want to compute all the eigenvalues and eigenvectors of a real, symmetric matrix. 我想计算一个真实对称矩阵的所有特征值和特征向量。 I have implemented code in MATLAB (actually, I run it using Octave), and C++, using the GNU Scientific Library . 我已经使用GNU Scientific Library在MATLAB中实现了代码(实际上,我使用Octave运行它)和C ++。 I am providing my full code below for both implementations. 我在下面提供了完整的代码用于两种实现。

As far as I can understand, GSL comes with its own implementation of the BLAS API, (hereafter I refer to this as GSLCBLAS) and to use this library I compile using: 据我所知,GSL带有自己的BLAS API实现,(以下我将其称为GSLCBLAS)并使用我使用以下编译的库:

g++ -O3 -lgsl -lgslcblas

GSL suggests here to use an alternative BLAS library, such as the self-optimizing ATLAS library, for improved performance. GSL表明这里使用替代BLAS库,如自我优化ATLAS库,以提高性能。 I am running Ubuntu 12.04, and have installed the ATLAS packages from the Ubuntu repository . 我正在运行Ubuntu 12.04,并已从Ubuntu存储库安装了ATLAS软件包。 In this case, I compile using: 在这种情况下,我编译使用:

g++ -O3 -lgsl -lcblas -latlas -lm

For all three cases, I have performed experiments with randomly-generated matrices of sizes 100 to 1000 in steps of 100. For each size, I perform 10 eigendecompositions with different matrices, and average the time taken. 对于所有三种情况,我已经使用随机生成的大小为100到1000的矩阵进行了实验,步长为100.对于每个大小,我执行10个具有不同矩阵的特征分解,并平均所花费的时间。 The results are these: 结果如下:

结果图

The difference in performance is ridiculous. 性能上的差异是荒谬的。 For a matrix of size 1000, Octave performs the decomposition in under a second; 对于大小为1000的矩阵,Octave在一秒钟内执行分解; GSLCBLAS and ATLAS take around 25 seconds. GSLCBLAS和ATLAS大约需要25秒。

I suspect that I may be using the ATLAS library incorrectly. 我怀疑我可能错误地使用了ATLAS库。 Any explanations are welcome; 欢迎任何解释; thanks in advance. 提前致谢。

Some notes on the code: 关于代码的一些注意事项:

  • In the C++ implementation, there is no need to make the matrix symmetric, because the function only uses the lower triangular part of it . 在C ++实现中,不需要使矩阵对称,因为该函数仅使用它的下三角部分

  • In Octave, the line triu(A) + triu(A, 1)' enforces the matrix to be symmetric. 在Octave中,行triu(A) + triu(A, 1)'强制矩阵是对称的。

  • If you wish to compile the C++ code your own Linux machine, you also need to add the flag -lrt , because of the clock_gettime function. 如果您希望编译自己的Linux机器的C ++代码,还需要添加标志-lrt ,因为clock_gettime函数。

  • Unfortunately I don't think clock_gettime exits on other platforms. 不幸的是,我不认为clock_gettime在其他平台上退出。 Consider changing it to gettimeofday . 考虑将其更改为gettimeofday

Octave Code 八度代码

K = 10;

fileID = fopen('octave_out.txt','w');

for N = 100:100:1000
    AverageTime = 0.0;

    for k = 1:K
        A = randn(N, N);
        A = triu(A) + triu(A, 1)';
        tic;
        eig(A);
        AverageTime = AverageTime + toc/K;
    end

    disp([num2str(N), " ", num2str(AverageTime), "\n"]);
    fprintf(fileID, '%d %f\n', N, AverageTime);
end

fclose(fileID);

C++ Code C ++代码

#include <iostream>
#include <fstream>
#include <time.h>

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_eigen.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_matrix.h>

int main()
{
    const int K = 10;

    gsl_rng * RandomNumberGenerator = gsl_rng_alloc(gsl_rng_default);
    gsl_rng_set(RandomNumberGenerator, 0);

    std::ofstream OutputFile("atlas.txt", std::ios::trunc);

    for (int N = 100; N <= 1000; N += 100)
    {
        gsl_matrix* A = gsl_matrix_alloc(N, N);
        gsl_eigen_symmv_workspace* EigendecompositionWorkspace = gsl_eigen_symmv_alloc(N);
        gsl_vector* Eigenvalues = gsl_vector_alloc(N);
        gsl_matrix* Eigenvectors = gsl_matrix_alloc(N, N);

        double AverageTime = 0.0;
        for (int k = 0; k < K; k++)
        {   
            for (int i = 0; i < N; i++)
            {
                for (int j = 0; j < N; j++)
                {
                    gsl_matrix_set(A, i, j, gsl_ran_gaussian(RandomNumberGenerator, 1.0));
                }
            }

            timespec start, end;
            clock_gettime(CLOCK_MONOTONIC_RAW, &start);

            gsl_eigen_symmv(A, Eigenvalues, Eigenvectors, EigendecompositionWorkspace);

            clock_gettime(CLOCK_MONOTONIC_RAW, &end);
            double TimeElapsed = (double) ((1e9*end.tv_sec + end.tv_nsec) - (1e9*start.tv_sec + start.tv_nsec))/1.0e9;
            AverageTime += TimeElapsed/K;
            std::cout << "N = " << N << ", k = " << k << ", Time = " << TimeElapsed << std::endl;
        }
        OutputFile << N << " " << AverageTime << std::endl;

        gsl_matrix_free(A);
        gsl_eigen_symmv_free(EigendecompositionWorkspace);
        gsl_vector_free(Eigenvalues);
        gsl_matrix_free(Eigenvectors);
    }

    return 0;
}

I disagree with the previous post. 我不同意上一篇文章。 This is not a threading issue, this is an algorithm issue. 这不是一个线程问题,这是一个算法问题。 The reason matlab, R, and octave wipe the floor with C++ libraries is because their C++ libraries use more complex, better algorithms. matlab,R和octave用C ++库擦拭地板的原因是因为他们的C ++库使用更复杂,更好的算法。 If you read the octave page you can find out what they do[1]: 如果您阅读八度页面,您可以找到他们做的事情[1]:

Eigenvalues are computed in a several step process which begins with a Hessenberg decomposition, followed by a Schur decomposition, from which the eigenvalues are apparent. 特征值是在几步过程中计算的,该过程以Hessenberg分解开始,然后是Schur分解,特征值是明显的。 The eigenvectors, when desired, are computed by further manipulations of the Schur decomposition. 当需要时,通过进一步操纵Schur分解来计算特征向量。

Solving eigenvalue/eigenvector problems is non-trivial. 解决特征值/特征向量问题并非易事。 In fact its one of the few things "Numerical Recipes in C" recommends you don't implement yourself. 事实上,它是“C中的数字食谱”中为数不多的东西之一,建议你不要自己实现。 (p461). (P461)。 GSL is often slow, which was my initial response. GSL通常很慢,这是我最初的回应。 ALGLIB is also slow for its standard implementation (I'm getting about 12 seconds!): ALGLIB的标准实现速度也很慢(我大约需要12秒!):

#include <iostream>
#include <iomanip>
#include <ctime>

#include <linalg.h>

using std::cout;
using std::setw;
using std::endl;

const int VERBOSE = false;

int main(int argc, char** argv)
{

    int size = 0;
    if(argc != 2) {
        cout << "Please provide a size of input" << endl;
        return -1;
    } else {
        size = atoi(argv[1]);
        cout << "Array Size: " << size << endl;
    }

    alglib::real_2d_array mat;
    alglib::hqrndstate state;
    alglib::hqrndrandomize(state);
    mat.setlength(size, size);
    for(int rr = 0 ; rr < mat.rows(); rr++) {
        for(int cc = 0 ; cc < mat.cols(); cc++) {
            mat[rr][cc] = mat[cc][rr] = alglib::hqrndnormal(state);
        }
    }

    if(VERBOSE) {
        cout << "Matrix: " << endl;
        for(int rr = 0 ; rr < mat.rows(); rr++) {
            for(int cc = 0 ; cc < mat.cols(); cc++) {
                cout << setw(10) << mat[rr][cc];
            }
            cout << endl;
        }
        cout << endl;
    }

    alglib::real_1d_array d;
    alglib::real_2d_array z;
    auto t = clock();
    alglib::smatrixevd(mat, mat.rows(), 1, 0, d, z);
    t = clock() - t;

    cout << (double)t/CLOCKS_PER_SEC << "s" << endl;

    if(VERBOSE) {
        for(int cc = 0 ; cc < mat.cols(); cc++) {
            cout << "lambda: " << d[cc] << endl;
            cout << "V: ";
            for(int rr = 0 ; rr < mat.rows(); rr++) {
                cout << setw(10) << z[rr][cc];
            }
            cout << endl;
        }
    }
}

If you really need a fast library, probably need to do some real hunting. 如果你真的需要一个快速的库,可能需要做一些真正的狩猎。

[1] http://www.gnu.org/software/octave/doc/interpreter/Basic-Matrix-Functions.html [1] http://www.gnu.org/software/octave/doc/interpreter/Basic-Matrix-Functions.html

I have also encountered with the problem. 我也遇到过这个问题。 The real cause is that the eig() in matlab doesn't calculate the eigenvectors, but the C version code above does. 真正的原因是matlab中的eig()不计算特征向量,但上面的C版本代码确实如此。 The different in time spent can be larger than one order of magnitude as shown in the figure below. 所花费的时间差异可能大于一个数量级,如下图所示。 So the comparison is not fair. 所以比较不公平。

In Matlab, depending on the return value, the actual function called will be different. 在Matlab中,根据返回值,调用的实际函数将是不同的。 To force the calculation of eigenvectors, the [V,D] = eig(A) should be used (see codes below). 要强制计算特征向量,应使用[V,D] = eig(A) (参见下面的代码)。

The actual time to compute eigenvalue problem depends heavily on the matrix properties and the desired results, such as 计算特征值问题的实际时间在很大程度上取决于矩阵属性和所需的结果,例如

  • Real or complex 真实的或复杂的
  • Hermitian/Symmetric or not Hermitian / Symmetric与否
  • Dense or sparse 密集或稀疏
  • Eigenvalues only, Eigenvectors, Maximum eigenvalue only, etc 仅特征值,特征向量,仅最大特征值等
  • Serial or parallel 串行或并行

There are algorithms optimized for each of the above case. 有针对上述每种情况优化的算法。 In the gsl, these algorithm are picked manually , so a wrong selection will decrease performance significantly. 在gsl中,这些算法是手动选取的 ,因此错误的选择会显着降低性能。 Some C++ wrapper class or some language such as matlab and mathematica will choose the optimized version through some methods. 某些C ++包装类或某些语言(如matlab和mathematica)将通过某些方法选择优化版本。

Also, the Matlab and Mathematica have used parallelization. 此外,Matlab和Mathematica使用了并行化。 These are further broaden the gap you see by few times, depending on the machine. 根据机器的不同,这些进一步扩大了您看到的差距几次。 It is reasonable to say that the calculation of eigenvalues and eigenvectors of a general complex 1000x1000 are about a second and ten second, without parallelization. 可以合理地说,一般复数1000x1000的特征值和特征向量的计算大约是秒和10秒,没有并行化。

比较Matlab和C. Fig. Compare Matlab and C. The "+ vec" means the codes included the calculations of the eigenvectors. 图。比较Matlab和C.“+ vec”表示代码包括特征向量的计算。 The CPU% is the very rough observation of CPU usage at N=1000 which is upper bounded by 800%, though they are supposed to fully use all 8 cores. CPU%是对N = 1000的CPU使用率的粗略观察,其上限​​为800%,尽管它们应该完全使用所有8个核心。 The gap between Matlab and C are smaller than 8 times. Matlab和C之间的差距小于8倍。

比较Mathematica中的不同矩阵类型 Fig. Compare different matrix type in Mathematica. 图。比较Mathematica中的不同矩阵类型。 Algorithms automatically picked by program. 算法自动选择算法。

Matlab (WITH the calculation of eigenvectors) Matlab(与特征向量的计算)

K = 10;

fileID = fopen('octave_out.txt','w');

for N = 100:100:1000
    AverageTime = 0.0;

    for k = 1:K
        A = randn(N, N);
        A = triu(A) + triu(A, 1)';
        tic;
        [V,D] = eig(A);
        AverageTime = AverageTime + toc/K;
    end

    disp([num2str(N), ' ', num2str(AverageTime), '\n']);
    fprintf(fileID, '%d %f\n', N, AverageTime);
end

fclose(fileID);

C++ (WITHOUT the calculation of eigenvectors) C ++(没有特征向量的计算)

#include <iostream>
#include <fstream>
#include <time.h>

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_eigen.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_matrix.h>

int main()
{
    const int K = 10;

    gsl_rng * RandomNumberGenerator = gsl_rng_alloc(gsl_rng_default);
    gsl_rng_set(RandomNumberGenerator, 0);

    std::ofstream OutputFile("atlas.txt", std::ios::trunc);

    for (int N = 100; N <= 1000; N += 100)
    {
        gsl_matrix* A = gsl_matrix_alloc(N, N);
        gsl_eigen_symm_workspace* EigendecompositionWorkspace = gsl_eigen_symm_alloc(N);
        gsl_vector* Eigenvalues = gsl_vector_alloc(N);

        double AverageTime = 0.0;
        for (int k = 0; k < K; k++)
        {   
            for (int i = 0; i < N; i++)
            {
                for (int j = i; j < N; j++)
                {
                    double rn = gsl_ran_gaussian(RandomNumberGenerator, 1.0);
                    gsl_matrix_set(A, i, j, rn);
                    gsl_matrix_set(A, j, i, rn);
                }
            }

            timespec start, end;
            clock_gettime(CLOCK_MONOTONIC_RAW, &start);

            gsl_eigen_symm(A, Eigenvalues, EigendecompositionWorkspace);

            clock_gettime(CLOCK_MONOTONIC_RAW, &end);
            double TimeElapsed = (double) ((1e9*end.tv_sec + end.tv_nsec) - (1e9*start.tv_sec + start.tv_nsec))/1.0e9;
            AverageTime += TimeElapsed/K;
            std::cout << "N = " << N << ", k = " << k << ", Time = " << TimeElapsed << std::endl;
        }
        OutputFile << N << " " << AverageTime << std::endl;

        gsl_matrix_free(A);
        gsl_eigen_symm_free(EigendecompositionWorkspace);
        gsl_vector_free(Eigenvalues);
    }

    return 0;
}

Mathematica 数学

(* Symmetric real matrix + eigenvectors *)
Table[{NN, Mean[Table[(
     M = Table[Random[], {i, NN}, {j, NN}];
     M = M + Transpose[Conjugate[M]];
     AbsoluteTiming[Eigensystem[M]][[1]]
     ), {K, 10}]]
  }, {NN, Range[100, 1000, 100]}]

(* Symmetric real matrix *)
Table[{NN, Mean[Table[(
     M = Table[Random[], {i, NN}, {j, NN}];
     M = M + Transpose[Conjugate[M]];
     AbsoluteTiming[Eigenvalues[M]][[1]]
     ), {K, 10}]]
  }, {NN, Range[100, 1000, 100]}]

(* Asymmetric real matrix *)
Table[{NN, Mean[Table[(
     M = Table[Random[], {i, NN}, {j, NN}];
     AbsoluteTiming[Eigenvalues[M]][[1]]
     ), {K, 10}]]
  }, {NN, Range[100, 1000, 100]}]

(* Hermitian matrix *)
Table[{NN, Mean[Table[(
     M = Table[Random[] + I Random[], {i, NN}, {j, NN}];
     M = M + Transpose[Conjugate[M]];
     AbsoluteTiming[Eigenvalues[M]][[1]]
     ), {K, 10}]]
  }, {NN, Range[100, 1000, 100]}]

(* Random complex matrix *)
Table[{NN, Mean[Table[(
     M = Table[Random[] + I Random[], {i, NN}, {j, NN}];
     AbsoluteTiming[Eigenvalues[M]][[1]]
     ), {K, 10}]]
  }, {NN, Range[100, 1000, 100]}]

In the C++ implementation, there is no need to make the matrix symmetric, because the function only uses the lower triangular part of it. 在C ++实现中,不需要使矩阵对称,因为该函数仅使用它的下三角部分。

This may not be the case. 情况可能并非如此。 In the reference , it is stated that: 参考文献中 ,声明:

int gsl_eigen_symmv(gsl_matrix *A,gsl_vector *eval, gsl_matrix *evec, gsl_eigen_symmv_workspace * w) int gsl_eigen_symmv(gsl_matrix * A,gsl_vector * eval,gsl_matrix * evec,gsl_eigen_symmv_workspace * w)

This function computes the eigenvalues and eigenvectors of the real symmetric matrix A . 该函数计算实对称矩阵A的特征值和特征向量。 Additional workspace of the appropriate size must be provided in w. 必须在w中提供适当大小的附加工作空间。 The diagonal and lower triangular part of A are destroyed during the computation, but the strict upper triangular part is not referenced. A的对角线和下三角形部分在计算过程中被破坏,但没有参考严格的上三角形部分。 The eigenvalues are stored in the vector eval and are unordered. 特征值存储在向量eval中并且是无序的。 The corresponding eigenvectors are stored in the columns of the matrix evec. 相应的特征向量存储在矩阵evec的列中。 For example, the eigenvector in the first column corresponds to the first eigenvalue. 例如,第一列中的特征向量对应于第一特征值。 The eigenvectors are guaranteed to be mutually orthogonal and normalised to unit magnitude. 保证特征向量相互正交并归一化为单位幅度。

It seems that you also need to apply a similar symmetrization operation in C++ in order to get at least correct results although you can get the same performance. 您似乎还需要在C ++中应用类似的对称化操作,以便获得至少正确的结果,尽管您可以获得相同的性能。

On the MATLAB side, eigen value decomposition may be faster due to its multi-threaded execution as stated in this reference : 在MATLAB方面,由于本参考文献中所述的多线程执行,特征值分解可能更快:

Built-in Multithreading 内置多线程

Linear algebra and numerical functions such as fft, \\ (mldivide), eig, svd, and sort are multithreaded in MATLAB. 线性代数和数值函数,如fft,\\(mldivide),eig,svd和sort在MATLAB中是多线程的。 Multithreaded computations have been on by default in MATLAB since Release 2008a. 自Release 2008a以来,MATLAB中默认启用了多线程计算。 These functions automatically execute on multiple computational threads in a single MATLAB session, allowing them to execute faster on multicore-enabled machines. 这些函数在单个MATLAB会话中自动在多个计算线程上执行,从而允许它们在支持多核的机器上执行得更快。 Additionally, many functions in Image Processing Toolbox™ are multithreaded. 此外,Image Processing Toolbox™中的许多功能都是多线程的。

In order to test the performance of MATLAB for single core, you can disable multithreading by 为了测试MATLAB对单核的性能,可以通过以下方式禁用多线程

File>Preferences>General>Multithreading 文件>首选项>常规>多线程

in R2007a or newer as stated here . 在R2007a或更高版本的说明这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM