简体   繁体   中英

Faster alternative to numpy for manual element-wise operations on large arrays?

I have some code that was originally written in C (by someone else) using C-style malloc arrays. I later converted a lot of it to C++ style, using vector<vector<vector<complex>>> arrays for consistency with the rest of my project. I never timed it, but both methods seemed to be of similar speed.

I recently started a new project in python, and I wanted to use some of this old code. Not wanting to move data back and for between projects, I decided to port this old code into python so that it's all in one project. I naively typed up all of the code in python syntax, replacing any arrays in the old code with numpy arrays (initialising them like this array = np.zeros(list((1024, 1024)), dtype=complex) ). The code works fine, but it is excruciatingly slow. If I had to guess, I would say it's on the order of 1000 times slower.

Now having looked into it, I see that a lot of people say numpy is very slow for element-wise operations. While I have used some of the numpy functions for common mathematical operations, such as FFTs and matrix multiplication, most of my code involves nested for loops. A lot of it is pretty complicated and doesn't seem to me to be amenable to reducing to simple array operations that are faster in numpy.

So, I'm wondering if there is an alternative to numpy that is faster for these kind of calculations. The ideal scenario would be that there is a module that I can import that has a lot of the same functionality, so I don't have to rewrite much of my code (ie, something that can do FFTs and initialises arrays in the same way, etc.), but failing that, I would be happy with something that I could at least use for the more computationally demanding parts of the code and cast back and forth between the numpy arrays as needed.

cpython arrays sounded promising, but a lot of benchmarks I've seen don't show enough of a difference in speed for my purposes. To give an idea of the kind of thing I'm talking about, this is one of the methods that is slowing down my code. This is called millions of times, and the vz_at() method contains a lookup table and does some interpolation to give the final return value:

    def tra(self, tr, x, y, z_number, i, scalex, idx, rmax2, rminsq):
        M = 1024
        ixo = int(x[i] / scalex)
        iyo = int(y[i] / scalex)
        nx1 = ixo - idx
        nx2 = ixo + idx
        ny1 = iyo - idx
        ny2 = iyo + idx

        for ix in range(nx1, nx2 + 1):
            rx2 = x[i] - float(ix) * scalex
            rx2 = rx2 * rx2
            ixw = ix
            while ixw < 0:
                ixw = ixw + M
            ixw = ixw % M
            for iy in range(ny1, ny2 + 1):
                rsq = y[i] - float(iy) * scalex
                rsq = rx2 + rsq * rsq
                if rsq <= rmax2:
                    iyw = iy
                    while iyw < 0:
                        iyw = iyw + M
                    iyw = iyw % M
                    if rsq < rminsq:
                        rsq = rminsq
                    vz = P.vz_at(z_number[i], rsq)
                    tr[ixw, iyw] += vz

All up, there are a couple of thousands of lines of code; this is just a small snippet to give an example. To be clear, a lot of my arrays are 1024x1024x1024 or 1024x1024 and are complex-valued. Others are one-dimensional arrays on the order of a million elements. What's the best way I can speed these element-wise operations up?

For information, some of your code can be made more concise and thus a bit more readable. For instance:

array = np.zeros(list((1024, 1024)), dtype=complex)).

can be written

array = np.zeros((1024, 1024), dtype=complex)

As you are trying out Python, this is at least a nice benefit :-)

Now, for your problem there are several solutions in the current Python scientific landscape:

  1. Numba is a just-in-time compiler for Python that is dedicated to array processing, achieving good performance when NumPy hits its limits.

    Pros: Little to no modification of your code as you just write plain Python, shows good performance in many situations. Numba should recognize some NumPy operations to avoid a Numba->Python->NumPy slowdown.
    Cons: Can be tedious to install and hence to distribute Numba-based code.

  2. Cython is a mix of Python and C to generate compiled functions. You can start from a pure Python file and accelerate the code via type annotations and the use of some "C"-isms.

    Pros: stable, widely used, relatively easy to distribute Cython-based code.
    Cons: need to rewrite the performance critical code, even if only in part.

As an additional hint, Nicolas Rougier (a French scientist) wrote an online book on many situations where you can make use of NumPy to speed up Python code: http://www.labri.fr/perso/nrougier/from-python-to-numpy/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM