使用Numpy优化Python中的数组操作

Question

I'm baffled. 我很困惑。 I just ported my code from Java to Python. 我只是将我的代码从Java移植到Python。 Goods news is the Python alternative for the lib I'm using is much quicker. 商品新闻是我使用的lib的Python替代品更快。 Bad part is that my custom processing code is much slower with the Python alternative I wrote :( I even removed some parts I deemed unnecessary, still much slower. The Java version took about half a second, Python takes 5-6. 不好的是我的自定义处理代码比我写的Python替代要慢得多:(我甚至删除了一些我认为不必要的部分，仍然慢得多.Java版本花了大约半秒钟，Python需要5-6。

rimg1 = imageio.imread('test1.png').astype(np.uint8)
rimg2 = imageio.imread('test2.png').astype(np.uint8)

sum_time = 0
for offset in range(-left, right):
    rdest = np.zeros((h, w, 3)).astype(np.uint8)

    if offset == 0:
        continue

    mult = np.uint8(1.0 / (offset * multiplier / frames))
    for y in range(h):
        for x in range(0, w - backup, 1):
            slice_time = time.time()
            src = rimg2[y,x] // mult + 1
            sum_time += time.time() - slice_time

            pix = rimg1[y,x + backup]

w ~= 384 and h ~= 384 src ranges from 0 - 30 usually. w~ = 384和h~ = 384 src通常为0-30。 left to right is -5 to 5 从左到右是-5到5

How come sum_time takes about a third of my total time? sum_time怎么花费我总时间的三分之一？

Edit 编辑

With the help of josephjscheidt I made some changes. 在josephjscheidt的帮助下，我做了一些改变。

mult = np.uint8(1.0 / (offset * multiplier / frames))
multArray = np.floor_divide(rimg2, mult) + 1
for y in range(h):
    pixy = rimg1[y]
    multy = multArray[y]
    for x in range(0, w - backup, 1):
        src = multy[y]
        slice_time = time.time()
        pix = pixy[x + backup]
        sum_time += time.time() - slice_time
        ox = x
        for o in range(src):
            if ox < 0:
                break

            rdest[y,ox] = pix
            ox-=1

Using the numpy iterator for the srcArray cuts total time almost in half! 使用srcArray的numpy迭代器将总时间减少了近一半！ The numpy operation itself seems to take negligible time. numpy操作本身似乎花费的时间可以忽略不计。

Now most of the time taken is in rimg1 lookup 现在大部分时间都是在rimg1查找中

pix = rimg1[x + backup]

and the inner for loop (both taking 50% of time). 内部for循环（两者占用50％的时间）。 Is it possible to handle this with numpy operations as well? 是否有可能通过numpy操作来处理这个问题？

Edit 编辑

I would figure rewriting it could be of benefit, but somehow the following actually takes a little bit longer: 我想重写它可能是有益的，但不知何故，以下实际上需要更长一点：

    for x in range(0, w - backup, 1):
        slice_time = time.time()
        lastox = max(x-multy[y], 0)
        rdest[y,lastox:x] = pixy[x + backup]
        sum_time += time.time() - slice_time

Edit 编辑

            slice_time = time.time()
            depth = multy[y]
            pix = pixy[x + backup]

            ox = x

            #for o in range(depth):
            #    if ox < 0:
            #        break;
            #
            #    rdesty[ox] = pix
            #    ox-=1

            # if I uncomment the above lines, and comment out the following two
            # it takes twice as long!
            lastox = max(x-multy[y], 0)
            rdesty[lastox:x] = pixy[x + backup]

            sum_time += time.time() - slice_time

The python interpreter is strange.. python解释器很奇怪..

Time taken is now 2.5 seconds for sum_time. sum_time现在花费的时间是2.5秒。 In comparison, Java does it in 60ms 相比之下，Java在60ms内完成

Answer 1

For loops are notoriously slow with numpy arrays, and you have a three-layer for loop here. 对于numpy数组而言，for循环非常慢，并且这里有一个三层for循环。 The underlying concept with numpy arrays is to perform operations on the entire array at once, rather than trying to iterate over them. numpy数组的基本概念是一次对整个数组执行操作，而不是尝试迭代它们。

Although I can't entirely interpret your code, because most of the variables are undefined in the code chunk you provided, I'm fairly confident you can refactor here and vectorize your commands to remove the loops. 虽然我无法完全解释您的代码，因为大多数变量在您提供的代码块中未定义，我相当有信心您可以在这里重构并向量化您的命令以删除循环。 For instance, if you redefine offset as a one-dimensional array, then you can calculate all values of mult at once without having to invoke a for loop: mult will become a one-dimensional array holding the correct values. 例如，如果将偏移重新定义为一维数组，则可以一次计算mult的所有值，而无需调用for循环：mult将成为保持正确值的一维数组。 We can avoid dividing by zero using the out argument (setting the default output to the offset array) and where argument (performing the calculation only where offset doesn't equal zero): 我们可以使用out参数（将默认输出设置为偏移数组）和where参数（仅在offset不等于零的情况下执行计算）来避免除以零：

mult = np.uint8(np.divide(1.0, (offset * multiplier / frames),
                          out = offset, where = (offset != 0))

Then, to use the mult array on the rimg2 row by row, you can use a broadcasting trick (here, I'm assuming you want to add one to each element in rimg2): 然后，要逐行使用rimg2上的mult数组，你可以使用广播技巧（这里，我假设你想在rimg2中为每个元素添加一个）：

src = np.floor_divide(rimg2, mult[:,None], out = rimg2, where = (mult != 0)) + 1

I found this article extremely helpful when learning how to effectively work with numpy arrays: 在学习如何有效地使用numpy数组时，我发现这篇文章非常有用：

https://realpython.com/numpy-array-programming/ https://realpython.com/numpy-array-programming/

Since you are working with images, you may want to especially pay attention to the section on image feature extraction and stride_tricks. 由于您正在处理图像，因此您可能需要特别注意有关图像特征提取和stride_tricks的部分。 Anyway, I hope this helps you get started. 无论如何，我希望这可以帮助你开始。

使用Numpy优化Python中的数组操作

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-08-03 02:46:58

使用Numpy优化Python中的数组操作

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-08-03 02:46:58

解决方案1
3 已采纳 2019-08-03 02:46:58