简体   繁体   中英

Why accessing a numpy array is 6 times slower than Pillow image with Cython

I have an RGB image where every pixel has to be recalculated with a special formula and for performance reasons I'm using Cython for the hot loops.

My code is working with a Pillow Image object passed to Cython code where each pixel is accessed as image[x, y] = something , which uses Python __getitem__ and thus very slow (I measure and it take 50% of function time on get/set item).

My idea was to use a numpy array and possibly a Cython memory view to speed this up, but the new duration of the changed function is now 6 times slower!!!

(BTW the original calculate_color function I use is waaaaay more complex than the example below, so don't ask me to rewrite the piece using FOO or BAR. I'm trying to compare r_001 and r_002 )

Code main.py

from PIL import Image
import numpy as np
import pyximport
pyximport.install(setup_args={"include_dirs":np.get_include()})
import rendering

size = 2000, 2000
mode = 'RGB'
mult = np.ones(size[0] * size[1], np.long)

im_new = Image.new(mode, size)
rendering.r_001(im_new.load(), mult, size[0], size[1])

im_new = Image.new(mode, size)
rendering.r_002(np.array(im_new), mult, size[0], size[1])

Code rendering.pyx

cimport numpy as np
np.import_array()
import numpy as np

def r_001(object image, np.ndarray[long, ndim=1] multiplier, long w, long h):
    cdef long x, y, x_index, y_index

    for y from 0 <= y < h-1:
        y_index = w * y
        for x from 0 <= x < w-1:
            x_index = x + y_index
            m = multiplier[x_index]
            r, g, b = image[x, y]
            image[x, y] = calculate_color(m, r, g, b)


def r_002(np.ndarray[char, ndim=3] image, np.ndarray[long, ndim=1] multiplier, long w, long h):
    cdef long x, y, x_index, y_index

    for y from 0 <= y < h-1:
        y_index = w * y
        for x from 0 <= x < w-1:
            x_index = x + y_index
            m = multiplier[x_index]
            r, g, b = image[x, y]
            r, g, b = calculate_color(m, r, g, b)
            image[x, y] = <char>r, <char>g, <char>b


cdef inline tuple calculate_color(long m, long r, long g, long b):
    cdef long a = 75
    r = (a * m + r * m) // 256
    g = (a * m + g * m) // 256
    b = (a * m + b * m) // 256
    if r > 255: r = 255
    if g > 255: g = 255
    if b > 255: b = 255
    return r, g, b

Using cProfile, I get that r_001 takes 1.53s to run while r_002 takes 9.68s.

You can get a good idea of what's going wrong by using cython -a on the code and looking at the annotated html. Essentially the problem lines are:

r, g, b = image[x, y]
# ...
image[x, y] = <char>r, <char>g, <char>b

Cython is usually only quick when indexing individual elements - for partial indexing it falls back on __getitem__ and then (in this case) tuple unpacking and packing. One way of rewriting the code would be:

r = image[x, y, 0]
g = image[x, y, 1]
b = image[x, y, 2]
# and equivalently for the assignment

You could look at speeding up calculate_color by having it return a " ctuple " :

cdef (char, char, char) calculate_color(... # as before ):

You should also set the type of r , g and b (in r_002 ) to char .


If it were me I'd probably make image a 2D array of 32 bit ints, and get the separate colours with bit masking. It would be a more significant change to your code, but would make the indexing easier.


cProfile is a bad way of timing individual functions - depending on the contents it can add a lot of overhead when they make calls. This tends not to be the case for Cython since it can't "see inside" Cython functions. It's good for getting an overview of a whole program and where it's using time, but for measuring performance of a self-contained small chunk use timeit or similar instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM