[英]Xtensor: can't reach numpy performance
I'm learning xtensor and want to get the same or even higher performance then NumPy. 我正在学习xtensor,并希望获得与NumPy相同甚至更高的性能。 But unfortunately, I can't and need help.
但不幸的是,我不能而且需要帮助。
I did similar benchmark as here : 我做了类似的基准为在这里 :
Performance of xtensor types vs. NumPy for simple reduction Xtensor类型与NumPy的性能对比
This is C++ code, where I used pybind11 and xtensor-python 这是C ++代码,在这里我使用pybind11和xtensor-python
bench.cpp
#include <iostream>
#define XTENSOR_USE_XSIMD
#include "xtensor/xtensor.hpp"
#include "xtensor/xfixed.hpp"
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xview.hpp"
#define FORCE_IMPORT_ARRAY // numpy C api loading
#include "xtensor-python/pytensor.hpp"
#include "xtensor-python/pyarray.hpp"
namespace py = pybind11;
inline double sum_pytensor(xt::pytensor<double, 1> &m)
{
return xt::sum(m)();
}
inline double sum_pytensor_immediate(xt::pytensor<double, 1> &m)
{
return xt::sum(m, xt::evaluation_strategy::immediate)();
}
PYBIND11_MODULE(xtensor_basics, m)
{
xt::import_numpy();
m.def("compute_xtensor", &sum_pytensor);
m.def("compute_xtensor_immediate", &sum_pytensor_immediate);
}
I build this with CMake 我用CMake构建它
CMakeLists.txt
cmake_minimum_required(VERSION 2.8.12)
project(xtensor_basics)
add_definitions(-DXTENSOR_ENABLE_XSIMD) # <-- does this anything?
add_definitions(-DXTENSOR_USE_XSIMD)
add_subdirectory(pybind11)
pybind11_add_module(xtensor_basics bench.cpp)
include_directories(/home/--user--/include)
include_directories(/home/--user--/.miniconda3/lib/python3.7/site-packages/numpy/core/include)
and the following command: cmake . && make
和以下命令:
cmake . && make
cmake . && make
which creates xtensor_basics.cpython-37m-x86_64-linux-gnu.so
cmake . && make
创建xtensor_basics.cpython-37m-x86_64-linux-gnu.so
Then I run the benchmark with this python file: bench.py
然后,我使用以下python文件运行基准测试:
bench.py
import timeit
def time_each(func_names, sizes):
setup = f'''
import numpy; import xtensor_basics
arr = numpy.random.randn({sizes})
'''
tim = lambda func: min(timeit.Timer(f'{func}(arr)',
setup=setup).repeat(3, 100))
return [tim(func) for func in func_names]
from functools import partial
sizes = [10 ** i for i in range(7)]
funcs = ['numpy.sum',
'xtensor_basics.compute_xtensor_immediate',
'xtensor_basics.compute_xtensor']
sum_timer = partial(time_each, funcs)
times = list(map(sum_timer, sizes))
print(times)
from matplotlib import pyplot as plt
plt.Figure(figsize=(5, 10))
plt.plot(times)
plt.legend(["numpy", "xtensor_immediate", "xtensor"])
plt.show()
Result: 结果:
Directory (after building) 目录 (构建后)
bench.cpp
bench.py
CMakeCache.txt
CMakeFiles
cmake_install.cmake
CMakeLists.txt
Makefile
pybind11 <---clonned from the repo
xtensor_basics.cpython-37m-x86_64-linux-gnu.so
Include dirrectory All folders containing headers (I didn't build these libraries, just copied headers) 包含目录所有包含标头的文件夹(我没有构建这些库,只是复制了标头)
$ ls /home/--user--/include -1
xflens
xsimd
xtensor
xtensor-blas
xtensor-python
xtl
System 系统
Ubuntu 18.04
g++ 7.4.0
numpy 1.16.4
openblas 0.2.20
python 3.7.3
xtensor 0.20.8
Question : What flags, definitions etc. should I add to get the same performance? 问题 :我应该添加哪些标志,定义等以获得相同的性能?
Thanks in advance. 提前致谢。
EDIT: 1 When I built with cmake -DCMAKE_BUILD_TYPE=Release .
编辑:1当我用
cmake -DCMAKE_BUILD_TYPE=Release .
构建时cmake -DCMAKE_BUILD_TYPE=Release .
, ie enabling optimisations, the result has improved, but still slower: ,即启用优化,结果有所改善,但仍然较慢:
Change CMakeLists.txt
a bit: 稍微更改
CMakeLists.txt
:
cmake_minimum_required(VERSION 2.8.12)
project(xtensor_basics)
add_definitions(-DXTENSOR_ENABLE_XSIMD)
add_definitions(-DXTENSOR_USE_XSIMD)\
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3 -mavx2 -ffast-math")
# ^^^^^^^^^^^^^^^^^^^
add_subdirectory(pybind11)
pybind11_add_module(xtensor_basics bench.cpp)
include_directories(/home/--user--/include)
include_directories(/home/--user--/.miniconda3/lib/python3.7/site-packages/numpy/core/include)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.