简体   繁体   English

Numpy vs Eigen vs Xtensor 线性代数基准奇数

[英]Numpy vs Eigen vs Xtensor Linear Algebra Benchmark Oddity

I recently was trying to compare different python and C++ matrix libraries against each other for their linear algebra performance in order to see which one(s) to use in an upcoming project.我最近试图比较不同的 python 和 C++ 矩阵库的线性代数性能,以了解在即将到来的项目中使用哪些。 While there are multiple types of linear algebra operations, I have chosen to focus mainly on matrix inversion, as it seems to be the one giving strange results.虽然有多种类型的线性代数运算,但我选择主要关注矩阵求逆,因为它似乎给出了奇怪的结果。 I have written the following code below for the comparison, but am thinking I must be doing something wrong.我在下面编写了以下代码进行比较,但我认为我一定做错了什么。

C++ Code C++代码

    #include <iostream>
    #include "eigen/Eigen/Dense"
    #include <xtensor/xarray.hpp>
    #include <xtensor/xio.hpp>
    #include <xtensor/xview.hpp>
    #include <xtensor/xrandom.hpp>
    #include <xtensor-blas/xlinalg.hpp> //-lblas -llapack for cblas,   -llapack -L OpenBLAS/OpenBLAS_Install/lib -l:libopenblas.a -pthread for openblas

    //including accurate timer
    #include <chrono>
    //including vector array
    #include <vector>

    void basicMatrixComparisonEigen(std::vector<int> dims, int numrepeats = 1000);
    void basicMatrixComparisonXtensor(std::vector<int> dims, int numrepeats = 1000);
    
    int main()
    {
      std::vector<int> sizings{1, 10, 100, 1000, 10000, 100000};
    
      basicMatrixComparisonEigen(sizings, 2);
      basicMatrixComparisonXtensor(sizings,2);
      return 0;
    }
    
    
    void basicMatrixComparisonEigen(std::vector<int> dims, int numrepeats)
    {
      std::chrono::high_resolution_clock::time_point t1;
      std::chrono::high_resolution_clock::time_point t2;
      using time = std::chrono::high_resolution_clock;
    
      std::cout << "Timing Eigen: " << std::endl;
      for (auto &dim : dims)
      {
    
        std::cout << "Scale Factor: " << dim << std::endl;
        try
        {
          //Linear Operations
          auto l = Eigen::MatrixXd::Random(dim, dim);
    
          //Eigen Matrix inversion
          t1 = time::now();
          for (int i = 0; i < numrepeats; i++)
          {
            Eigen::MatrixXd pinv = l.completeOrthogonalDecomposition().pseudoInverse();
            //note this does not come out to be identity.  The inverse is wrong.
            //std::cout<<l*pinv<<std::endl;
          }
          t2 = time::now();
          std::cout << "Eigen Matrix inversion took: " << std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count() * 1000 / (double)numrepeats << " milliseconds." << std::endl;
          std::cout << "\n\n\n";
        }
        catch (const std::exception &e)
        {
          std::cout << "Error:   '" << e.what() << "'\n";
        }
      }
    }
    
    
    void basicMatrixComparisonXtensor(std::vector<int> dims, int numrepeats)
    {
      std::chrono::high_resolution_clock::time_point t1;
      std::chrono::high_resolution_clock::time_point t2;
      using time = std::chrono::high_resolution_clock;
    
      std::cout << "Timing Xtensor: " << std::endl;
      for (auto &dim : dims)
      {
    
        std::cout << "Scale Factor: " << dim << std::endl;
        try
        {
    
          //Linear Operations
          auto l = xt::random::randn<double>({dim, dim});
    
          //Xtensor Matrix inversion
          t1 = time::now();
          for (int i = 0; i < numrepeats; i++)
          {
            auto inverse = xt::linalg::pinv(l);
            //something is wrong here.  The inverse is not actually the inverse when you multiply it out. 
            //std::cout << xt::linalg::dot(inverse,l) << std::endl;
          }
          t2 = time::now();
          std::cout << "Xtensor Matrix inversion took: " << std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count() * 1000 / (double)numrepeats << " milliseconds." << std::endl;
        
          std::cout << "\n\n\n";
        }
        catch (const std::exception &e)
        {
          std::cout << "Error:   '" << e.what() << "'\n";
        }
      }
    }

This is compiled with:这是编译的:

g++ cpp_library.cpp -O2  -llapack -L OpenBLAS/OpenBLAS_Install/lib -l:libopenblas.a -pthread -march=native -o benchmark.exe

for OpenBLAS, and对于 OpenBLAS,以及

g++ cpp_library.cpp -O2  -lblas -llapack -march=native -o benchmark.exe

for cBLAS.对于 cBLAS。
g++ version 9.3.0. g++ 版本 9.3.0。

And for Python 3:对于 Python 3:

import numpy as np
from datetime import datetime as dt

#import timeit

start=dt.now()
l=np.random.rand(1000,1000)
for i in range(2):
    result=np.linalg.inv(l)
end=dt.now()
print("Completed in: "+str((end-start)/2))
#print(np.matmul(l,result))
#print(np.dot(l,result))
#Timeit also gives similar results

I will focus on the largest decade that runs in a reasonable amount of time on my computer: 1000x1000.我将专注于在我的计算机上运行合理时间的最大十年:1000x1000。 I know that only 2 runs introduces a bit of variance, but I've run it with more and the results are roughly the same as below:我知道只有 2 次运行会引入一些差异,但我已经运行了更多次,结果大致与以下相同:

  • Eigen 3.3.9: 196.804 milliseconds特征 3.3.9: 196.804 毫秒
  • Xtensor/Xtensor-blas w/ OpenBlas: 378.156 milliseconds Xtensor/Xtensor-blas w/ OpenBlas: 378.156 毫秒
  • Numpy 1.17.4: 172.582 milliseconds Numpy 1.17.4: 172.582 毫秒

Is this a reasonable result to expect?这是一个合理的预期结果吗? Why are the C++ libraries slower than Numpy?为什么 C++ 库比 Numpy 慢? All 3 packages are using some sort of Lapack/BLAS backend, yet there is a significant difference between the 3. Particularly, Xtensor will pin my CPU to 100% usage with OpenBlas' threads, yet still manage to have worse performance.所有 3 个软件包都使用某种 Lapack/BLAS 后端,但 3 个之间存在显着差异。特别是,Xtensor 将使用 OpenBlas 的线程将我的 CPU 固定到 100% 的使用率,但仍然设法获得更差的性能。

I'm wondering if the C++ libraries are actually performing the inverse/pseudoinverse of the matrix, and if this is what is causing these results.我想知道 C++ 库是否实际上正在执行矩阵的逆/伪逆,如果这是导致这些结果的原因。 In the commented sections of the C++ test code, I have noted that when I sanity-checked the results from both Eigen and Xtensor, the resulting matrix product between the matrix and its inverse was not even close to the identity matrix.在 C++ 测试代码的注释部分中,我注意到当我对 Eigen 和 Xtensor 的结果进行健全性检查时,矩阵与其逆矩阵之间的结果矩阵乘积甚至不接近单位矩阵。 I tried with smaller matrices (10x10) thinking it might be a precision error, but the problem remained.我尝试使用较小的矩阵(10x10),认为这可能是一个精度错误,但问题仍然存在。 In another test, I test for rank, and these matrices are full rank.在另一个测试中,我测试秩,这些矩阵是满秩的。 To be sure I wasn't going crazy, I tried with inv() instead of pinv() in both cases, and the results are the same.为了确保我没有发疯,我在这两种情况下都尝试使用 inv() 而不是 pinv(),结果是一样的。 Am I using the wrong functions for this linear algebra benchmark, or is this Numpy twisting the knife on 2 disfunctional low level libraries?我是在这个线性代数基准测试中使用了错误的函数,还是这个 Numpy 在 2 个功能失调的低级库上扭曲了刀?

EDIT: Thank you everyone for your interest in this problem.编辑:谢谢大家对这个问题的兴趣。 I think I have figured out the issue.我想我已经弄清楚了这个问题。 I suspect Eigen and Xtensor have lazy evaluation and this actually is causing errors downstream, and outputting random matrices instead of the inversed matrices.我怀疑 Eigen 和 Xtensor 有惰性评估,这实际上导致下游错误,并输出随机矩阵而不是逆矩阵。 I was able to correct the strange numerical inversion failure with the following replacements in the code:我能够通过代码中的以下替换来纠正奇怪的数字反转失败:

auto temp = Eigen::MatrixXd::Random(dim, dim);
Eigen::MatrixXd l(dim,dim);
l=temp;

and

auto temp = xt::random::randn<double>({dim, dim});
xt::xarray<double> l =temp;

However, the timings didn't change much:但是,时间安排并没有太大变化:

  • Eigen 3.3.9: 201.386 milliseconds特征 3.3.9: 201.386 毫秒
  • Xtensor/Xtensor-blas w/ OpenBlas: 337.299 milliseconds. Xtensor/Xtensor-blas w/ OpenBlas: 337.299 毫秒。
  • Numpy 1.17.4: (from before) 172.582 milliseconds Numpy 1.17.4:(从之前)172.582 毫秒

Actually, a little strangely, adding -O3 and -ffast-math actually slowed down the code a little.实际上,有点奇怪的是,添加 -O3 和 -ffast-math 实际上会稍微减慢代码速度。 -march=native had the biggest performance increase for me when I tried it. -march=native 在我尝试时对我来说性能提升最大。 Also, OpenBLAS is 2-3X faster than CBLAS for these problems.此外,对于这些问题,OpenBLAS 比 CBLAS 快 2-3 倍。

Firstly, you are not computing same things.首先,您不是在计算相同的东西。

To compute inverse of l matrix, use l.inverse() for Eigen and xt::linalg::inv() for xtensor要计算 l 矩阵的逆矩阵,对 Eigen 使用 l.inverse(),对 xtensor 使用 xt::linalg::inv()

When you link Blas to Eigen or xtensor, these operations are automatically dispatched to the your choosen Blas.当您将 Blas 链接到 Eigen 或 xtensor 时,这些操作会自动分派给您选择的 Blas。

I tried replacing the inverse functions, replaced auto with MatrixXd and xt::xtensor to avoid lazy evaluation, linked openblas to Eigen, xtensor and numpy and compiled with only -O3 flag, the following are the results on my Macbook pro M1:我尝试替换反函数,用 MatrixXd 和 xt::xtensor 替换 auto 以避免延迟评估,将 openblas 链接到 Eigen、xtensor 和 numpy 并仅使用 -O3 标志编译,以下是我的 Macbook pro M1 上的结果:

Eigen-3.3.9 (with openblas) - ~ 38 ms Eigen-3.3.9(使用 openblas) - ~ 38 ms

Eigen-3.3.9 (without openblas) - ~ 85 ms Eigen-3.3.9 (没有 openblas) - ~ 85 ms

xtensor-master (with openblas) - ~41 ms xtensor-master(使用 openblas) - ~41 ms

Numpy- 1.21.2 (with openblas) - ~35 ms. Numpy- 1.21.2(使用 openblas) - ~35 ms。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM