I have an RGB image where every pixel has to be recalculated with a special formula and for performance reasons I'm using Cython for the hot loops.
My code is working with a Pillow Image object passed to Cython code where each pixel is accessed as image[x, y] = something
, which uses Python __getitem__
and thus very slow (I measure and it take 50% of function time on get/set item).
My idea was to use a numpy array and possibly a Cython memory view to speed this up, but the new duration of the changed function is now 6 times slower!!!
(BTW the original calculate_color
function I use is waaaaay more complex than the example below, so don't ask me to rewrite the piece using FOO or BAR. I'm trying to compare r_001
and r_002
)
Code main.py
from PIL import Image
import numpy as np
import pyximport
pyximport.install(setup_args={"include_dirs":np.get_include()})
import rendering
size = 2000, 2000
mode = 'RGB'
mult = np.ones(size[0] * size[1], np.long)
im_new = Image.new(mode, size)
rendering.r_001(im_new.load(), mult, size[0], size[1])
im_new = Image.new(mode, size)
rendering.r_002(np.array(im_new), mult, size[0], size[1])
Code rendering.pyx
cimport numpy as np
np.import_array()
import numpy as np
def r_001(object image, np.ndarray[long, ndim=1] multiplier, long w, long h):
cdef long x, y, x_index, y_index
for y from 0 <= y < h-1:
y_index = w * y
for x from 0 <= x < w-1:
x_index = x + y_index
m = multiplier[x_index]
r, g, b = image[x, y]
image[x, y] = calculate_color(m, r, g, b)
def r_002(np.ndarray[char, ndim=3] image, np.ndarray[long, ndim=1] multiplier, long w, long h):
cdef long x, y, x_index, y_index
for y from 0 <= y < h-1:
y_index = w * y
for x from 0 <= x < w-1:
x_index = x + y_index
m = multiplier[x_index]
r, g, b = image[x, y]
r, g, b = calculate_color(m, r, g, b)
image[x, y] = <char>r, <char>g, <char>b
cdef inline tuple calculate_color(long m, long r, long g, long b):
cdef long a = 75
r = (a * m + r * m) // 256
g = (a * m + g * m) // 256
b = (a * m + b * m) // 256
if r > 255: r = 255
if g > 255: g = 255
if b > 255: b = 255
return r, g, b
Using cProfile, I get that r_001
takes 1.53s to run while r_002
takes 9.68s.
You can get a good idea of what's going wrong by using cython -a
on the code and looking at the annotated html. Essentially the problem lines are:
r, g, b = image[x, y]
# ...
image[x, y] = <char>r, <char>g, <char>b
Cython is usually only quick when indexing individual elements - for partial indexing it falls back on __getitem__
and then (in this case) tuple unpacking and packing. One way of rewriting the code would be:
r = image[x, y, 0]
g = image[x, y, 1]
b = image[x, y, 2]
# and equivalently for the assignment
You could look at speeding up calculate_color
by having it return a " ctuple
" :
cdef (char, char, char) calculate_color(... # as before ):
You should also set the type of r
, g
and b
(in r_002
) to char
.
If it were me I'd probably make image
a 2D array of 32 bit ints, and get the separate colours with bit masking. It would be a more significant change to your code, but would make the indexing easier.
cProfile
is a bad way of timing individual functions - depending on the contents it can add a lot of overhead when they make calls. This tends not to be the case for Cython since it can't "see inside" Cython functions. It's good for getting an overview of a whole program and where it's using time, but for measuring performance of a self-contained small chunk use timeit
or similar instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.