在大型数组上进行手动逐元素运算时，可以更快地替代numpy吗？

Question

I have some code that was originally written in C (by someone else) using C-style malloc arrays. 我有一些最初使用C风格的malloc数组用C（其他人）编写的代码。 I later converted a lot of it to C++ style, using vector<vector<vector<complex>>> arrays for consistency with the rest of my project. 后来我使用vector<vector<vector<complex>>>数组将其很多转换为C ++样式，以与项目的其余部分保持一致。 I never timed it, but both methods seemed to be of similar speed. 我从不计时，但两种方法的速度似乎都差不多。

I recently started a new project in python, and I wanted to use some of this old code. 我最近在python中启动了一个新项目，我想使用一些旧代码。 Not wanting to move data back and for between projects, I decided to port this old code into python so that it's all in one project. 不想在项目之间来回移动数据，我决定将旧代码移植到python中，以便将其全部放在一个项目中。 I naively typed up all of the code in python syntax, replacing any arrays in the old code with numpy arrays (initialising them like this array = np.zeros(list((1024, 1024)), dtype=complex) ). 我天真地用python语法键入了所有代码，用numpy数组替换了旧代码中的任何数组（像array = np.zeros(list((1024, 1024)), dtype=complex)那样初始化它们array = np.zeros(list((1024, 1024)), dtype=complex) ）。 The code works fine, but it is excruciatingly slow. 该代码可以正常工作，但是速度太慢了。 If I had to guess, I would say it's on the order of 1000 times slower. 如果我不得不猜测，我会说它慢了大约1000倍。

Now having looked into it, I see that a lot of people say numpy is very slow for element-wise operations. 现在已经研究了它，我看到很多人说numpy对于元素操作非常慢。 While I have used some of the numpy functions for common mathematical operations, such as FFTs and matrix multiplication, most of my code involves nested for loops. 尽管我将一些numpy函数用于常见的数学运算，例如FFT和矩阵乘法，但我的大多数代码都涉及嵌套的for循环。 A lot of it is pretty complicated and doesn't seem to me to be amenable to reducing to simple array operations that are faster in numpy. 其中很多都是非常复杂的，在我看来似乎不适合简化为以numpy更快的简单数组操作。

So, I'm wondering if there is an alternative to numpy that is faster for these kind of calculations. 因此，我想知道是否有numpy的替代方法可以更快地进行此类计算。 The ideal scenario would be that there is a module that I can import that has a lot of the same functionality, so I don't have to rewrite much of my code (ie, something that can do FFTs and initialises arrays in the same way, etc.), but failing that, I would be happy with something that I could at least use for the more computationally demanding parts of the code and cast back and forth between the numpy arrays as needed. 理想的情况是，我可以导入一个模块，该模块具有很多相同的功能，因此，我不必重写很多代码（即可以以相同方式执行FFT和初始化数组的代码）等等），但如果失败了，我会对至少可以用于代码中计算量更大的部分，并根据需要在numpy数组之间来回转换的东西感到满意。

cpython arrays sounded promising, but a lot of benchmarks I've seen don't show enough of a difference in speed for my purposes. cpython数组听起来很有前途，但我见过的许多基准测试并未显示出足够的速度差异。 To give an idea of the kind of thing I'm talking about, this is one of the methods that is slowing down my code. 为了让我大致了解一下，这是使我的代码变慢的方法之一。 This is called millions of times, and the vz_at() method contains a lookup table and does some interpolation to give the final return value: 这被称为数百万次，并且vz_at()方法包含一个查找表并进行一些插值以提供最终的返回值：

    def tra(self, tr, x, y, z_number, i, scalex, idx, rmax2, rminsq):
        M = 1024
        ixo = int(x[i] / scalex)
        iyo = int(y[i] / scalex)
        nx1 = ixo - idx
        nx2 = ixo + idx
        ny1 = iyo - idx
        ny2 = iyo + idx

        for ix in range(nx1, nx2 + 1):
            rx2 = x[i] - float(ix) * scalex
            rx2 = rx2 * rx2
            ixw = ix
            while ixw < 0:
                ixw = ixw + M
            ixw = ixw % M
            for iy in range(ny1, ny2 + 1):
                rsq = y[i] - float(iy) * scalex
                rsq = rx2 + rsq * rsq
                if rsq <= rmax2:
                    iyw = iy
                    while iyw < 0:
                        iyw = iyw + M
                    iyw = iyw % M
                    if rsq < rminsq:
                        rsq = rminsq
                    vz = P.vz_at(z_number[i], rsq)
                    tr[ixw, iyw] += vz

All up, there are a couple of thousands of lines of code; 总共有成千上万的代码行。 this is just a small snippet to give an example. 这只是一个小例子。 To be clear, a lot of my arrays are 1024x1024x1024 or 1024x1024 and are complex-valued. 需要明确的是，我的很多数组都是1024x1024x1024或1024x1024，并且是复数值。 Others are one-dimensional arrays on the order of a million elements. 其他是一百万个数组的一维数组。 What's the best way I can speed these element-wise operations up? 我加快这些按元素操作的最佳方法是什么？

Answer 1

For information, some of your code can be made more concise and thus a bit more readable. 有关信息，您的某些代码可以变得更简洁，从而更具可读性。 For instance: 例如：

array = np.zeros(list((1024, 1024)), dtype=complex)).

can be written 可以写

array = np.zeros((1024, 1024), dtype=complex)

As you are trying out Python, this is at least a nice benefit :-) 当您尝试使用Python时，这至少是一个不错的好处:-)

Now, for your problem there are several solutions in the current Python scientific landscape: 现在，针对您的问题，在当前的Python科学领域中有几种解决方案：

Numba is a just-in-time compiler for Python that is dedicated to array processing, achieving good performance when NumPy hits its limits. Numba是Python的即时编译器，专用于数组处理，当NumPy达到极限时可实现良好的性能。
Pros: Little to no modification of your code as you just write plain Python, shows good performance in many situations. 优点：只需编写普通的Python，几乎不需要修改代码，在许多情况下都显示出良好的性能。 Numba should recognize some NumPy operations to avoid a Numba->Python->NumPy slowdown. Numba应该识别一些NumPy操作，以避免Numba-> Python-> NumPy变慢。
Cons: Can be tedious to install and hence to distribute Numba-based code. 缺点：安装和分发基于Numba的代码可能很繁琐。
Cython is a mix of Python and C to generate compiled functions. Cython是Python和C的混合体，用于生成编译函数。 You can start from a pure Python file and accelerate the code via type annotations and the use of some "C"-isms. 您可以从一个纯Python文件开始，并通过类型注释和某些“ C”操作的使用来加速代码。
Pros: stable, widely used, relatively easy to distribute Cython-based code. 优点：稳定，广泛使用，相对容易分发基于Cython的代码。
Cons: need to rewrite the performance critical code, even if only in part. 缺点：即使部分内容，也需要重写性能关键代码。

As an additional hint, Nicolas Rougier (a French scientist) wrote an online book on many situations where you can make use of NumPy to speed up Python code: http://www.labri.fr/perso/nrougier/from-python-to-numpy/ 另一个提示是，尼古拉斯·鲁吉耶（Nicolas Rougier，法国科学家）写了一本在线书，介绍了许多情况下您可以利用NumPy加快Python代码的速度： http : //www.labri.fr/perso/nrougier/from-python- numpy /

在大型数组上进行手动逐元素运算时，可以更快地替代numpy吗？

问题描述

1 个解决方案

解决方案1
6 已采纳 2017-08-10 20:02:07

在大型数组上进行手动逐元素运算时，可以更快地替代numpy吗？

问题描述

1 个解决方案

解决方案1 6 已采纳 2017-08-10 20:02:07

解决方案1
6 已采纳 2017-08-10 20:02:07