Xtensor类型与NumPy的性能对比

Question

I was trying out xtensor-python and started by writing a very simple sum function, after using the cookiecutter setup and enabling SIMD intrinsics with xsimd . 在使用cookiecutter设置并使用xsimd启用SIMD内部函数之后，我尝试了xtensor-python并开始编写一个非常简单的sum函数。

inline double sum_pytensor(xt::pytensor<double, 1> &m)
{
  return xt::sum(m)();
}
inline double sum_pyarray(xt::pyarray<double> &m)
{
  return xt::sum(m)();
}

Used setup.py to build my Python module, then tested out the summation function on NumPy arrays constructed from np.random.randn of different sizes, comparing to np.sum . 二手setup.py打造我的Python模块，然后测试了从构造与NumPy阵列求和函数np.random.randn不同尺寸的，比较np.sum 。

import timeit

def time_each(func_names, sizes):
    setup = f'''
import numpy; import xtensor_basics
arr = numpy.random.randn({sizes})
    '''
    tim = lambda func: min(timeit.Timer(f'{func}(arr)',
                                        setup=setup).repeat(7, 100))
    return [tim(func) for func in func_names]

from functools import partial

sizes = [10 ** i for i in range(9)]
funcs = ['numpy.sum',
         'xtensor_basics.sum_pyarray',
         'xtensor_basics.sum_pytensor']
sum_timer = partial(time_each, funcs)
times = list(map(sum_timer, sizes))

This (possibly flawed) benchmark seemed to indicate that performance of xtensor for this basic function degraded for larger arrays as compared to NumPy. 该基准测试（可能存在缺陷）似乎表明，与NumPy相比，对于较大的阵列，此基本功能的xtensor性能下降。

           numpy.sum  xtensor_basics.sum_pyarray  xtensor_basics.sum_pytensor
1           0.000268                    0.000039                     0.000039
10          0.000258                    0.000040                     0.000039
100         0.000247                    0.000048                     0.000049
1000        0.000288                    0.000167                     0.000164
10000       0.000568                    0.001353                     0.001341
100000      0.003087                    0.013033                     0.013038
1000000     0.045171                    0.132150                     0.132174
10000000    0.434112                    1.313274                     1.313434
100000000   4.180580                   13.129517                    13.129058

benchfig

Any idea on why I'm seeing this? 我为什么会看到这个想法？ I'm guessing it's something NumPy utilizes that xtensor does not (yet), but I wasn't sure what it could be for a reduction as simple as this. 我想这是NumPy尚未利用xtensor的东西（至今），但是我不知道这样简单的减少可能是什么。 I dug through xmath.hpp but didn't see anything obvious, and nothing like this is referenced in the documentation. 我通过xmath.hpp进行了挖掘，但没有发现任何明显的东西，并且在文档中未引用任何类似内容。

Versions 版本

numpy                          1.13.3
openblas                       0.2.20
python                         3.6.3
xtensor                        0.12.1
xtensor-python                 0.14.0

Answer 1

wow this is a coincidence! 哇，这是一个巧合！ I am working on exactly this speedup! 我正在为此而努力！

xtensor's sum is a lazy operation -- and it doesn't use the most performant iteration order for (auto-)vectorization. xtensor的总和是一个懒惰的操作-并且它不使用性能最高的迭代顺序进行（自动）矢量化。 However, we just added a evaluation_strategy parameter to reductions (and the upcoming accumulations) which allows you to select between immediate and lazy reductions. 但是，我们只向减少量（和即将发生的累积量）添加了evaluation_strategy参数，使您可以在immediate减少量和lazy减少量之间进行选择。

Immediate reductions perform the reduction immediately (and not lazy) and can use a iteration order optimized for vectorized reductions. 立即归约立即执行归约（而不是延迟），并且可以使用针对矢量化归约优化的迭代顺序。

You can find this feature in this PR: https://github.com/QuantStack/xtensor/pull/550 您可以在此PR中找到此功能： https : //github.com/QuantStack/xtensor/pull/550

In my benchmarks this should be at least as fast or faster than numpy. 在我的基准测试中，这至少应该比numpy快或快。 I hope to get it merged today. 我希望今天将其合并。

Btw. 顺便说一句。 please don't hesitate to drop by our gitter channel and post a link to the question, we need to monitor StackOverflow better: https://gitter.im/QuantStack/Lobby 请不要犹豫，访问我们的gitter频道并发布问题的链接，我们需要更好地监视StackOverflow： https ://gitter.im/QuantStack/Lobby

Xtensor类型与NumPy的性能对比

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-11-23 10:55:47

Xtensor类型与NumPy的性能对比

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-11-23 10:55:47

解决方案1
5 已采纳 2017-11-23 10:55:47