简体   繁体   中英

Eigen + MKL or OpenBLAS slower than Numpy/Scipy + OpenBLAS

I'm starting with c++ atm and want to work with matrices and speed up things in general. Worked with Python+Numpy+OpenBLAS before. Thought c++ + Eigen + MKL might be faster or at least not slower.

My c++ code:

#define EIGEN_USE_MKL_ALL
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/LU>
#include <chrono>

using namespace std;
using namespace Eigen;

int main()
{
    int n = Eigen::nbThreads( );
    cout << "#Threads: " << n << endl;

    uint16_t size = 4000;
    MatrixXd a = MatrixXd::Random(size,size);

    clock_t start = clock ();
    PartialPivLU<MatrixXd> lu = PartialPivLU<MatrixXd>(a);

    float timeElapsed = double( clock() - start ) / CLOCKS_PER_SEC; 
    cout << "Elasped time is " << timeElapsed << " seconds." << endl ;
}

My Python code:

import numpy as np
from time import time
from scipy import linalg as la

size = 4000

A = np.random.random((size, size))

t = time()
LU, piv = la.lu_factor(A)
print(time()-t)

My timings:

C++     2.4s
Python  1.2s

Why is c++ slower than Python?

I am compiling c++ using:

g++ main.cpp -o main -lopenblas -O3 -fopenmp  -DMKL_LP64 -I/usr/local/include/mkl/include

MKL is definiely working: If I disable it the running time is around 13s.

I also tried C++ + OpenBLAS which gives me around 2.4s as well.

Any ideas why C++ and Eigen are slower than numpy/scipy?

The timing is just wrong. That's a typical symptom of wall clock time vs. CPU time . When I use the system_clock from the <chrono> header it “magically” becomes faster.

#define EIGEN_USE_MKL_ALL
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/LU>
#include <chrono>

int main()
{
    int const n = Eigen::nbThreads( );
    std::cout << "#Threads: " << n << std::endl;

    int const size = 4000;
    Eigen::MatrixXd a = Eigen::MatrixXd::Random(size,size);

    auto start = std::chrono::system_clock::now();

    Eigen::PartialPivLU<Eigen::MatrixXd> lu(a);

    auto stop = std::chrono::system_clock::now();

    std::cout << "Elasped time is "
              << std::chrono::duration<double>{stop - start}.count()
              << " seconds." << std::endl;
}

I compile with

icc -O3 -mkl -std=c++11 -DNDEBUG -I/usr/include/eigen3/ test.cpp

and get the output

#Threads: 1
Elasped time is 0.295782 seconds.

Your Python version reports 0.399146080017 on my machine.


Alternatively, to obtain comparable timing you could use time.clock() (CPU time) in Python instead of time.time() (wall clock time).

This is not a fair comparison. The python routine is operating on float precision while the c++ code needs to crunch doubles. This exactly doubles the computation time.

>>> type(np.random.random_sample())
<type 'float'>

You should compare with MatrixXf instead of MatrixXd and your MKL code should be equally fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM