简体   繁体   中英

Is there no faster way to convert (BGR) OpenCV image to CMYK?

I have a OpenCV image, as usual in BGR color space, and I need to convert it to CMYK. I searched online but found basically only (slight variations of) the following approach:

def bgr2cmyk(cv2_bgr_image):
    bgrdash = cv2_bgr_image.astype(float) / 255.0

    # Calculate K as (1 - whatever is biggest out of Rdash, Gdash, Bdash)
    K = 1 - numpy.max(bgrdash, axis=2)

    with numpy.errstate(divide="ignore", invalid="ignore"):
        # Calculate C
        C = (1 - bgrdash[..., 2] - K) / (1 - K)
        C = 255 * C
        C = C.astype(numpy.uint8)

        # Calculate M
        M = (1 - bgrdash[..., 1] - K) / (1 - K)
        M = 255 * M
        M = M.astype(numpy.uint8)

        # Calculate Y
        Y = (1 - bgrdash[..., 0] - K) / (1 - K)
        Y = 255 * Y
        Y = Y.astype(numpy.uint8)

    return (C, M, Y, K)

This works fine, however, it feels quite slow - for an 800 x 600 px image it takes about 30 ms on my i7 CPU. Typical operations with cv2 like thresholding and alike take only a few ms for the same image, so since this is all numpy I was expecting this CMYK conversion to be faster.

However, I haven't found anything that makes this significantly fater. There is a conversion to CMYK via PIL.Image , but the resulting channels do not look as they do with the algorithm listed above.

Any other ideas?

I would start by profiling which part is the bottleneck.

eg how fast is it without the / (1 - K) calculation?
-> precalculate 1/(1-K) might help. Even precalculation of 255/(1-K) is possible.

K = 1 - numpy.max(bgrdash, axis=2)
kRez255=255/(1 - K)

with numpy.errstate(divide="ignore", invalid="ignore"):
    # Calculate C
    C = (1 - bgrdash[..., 2] - K) * kRez255
    C = C.astype(numpy.uint8)

    # Calculate M
    M = (1 - bgrdash[..., 1] - K) * kRez255
    M = M.astype(numpy.uint8)

    # Calculate Y
    Y = (1 - bgrdash[..., 0] - K) * kRez255
    Y = Y.astype(numpy.uint8)

return (C, M, Y, K)

But only profiling can show if it is the calculation at all which slows down the conversion.

There are several things you should do:

  • shake the math
  • use integer math where possible
  • optimize beyond what numpy can do
Shaking the math

Given

RGB' = RGB / 255
K = 1 - max(RGB')
C = (1-K - R') / (1-K)
M = (1-K - G') / (1-K)
Y = (1-K - B') / (1-K)

You see what you can factor out.

RGB' = RGB / 255
J = max(RGB')
K = 1 - J
C = (J - R') / J
M = (J - G') / J
Y = (J - B') / J
Integer math

Don't normalize to [0,1] for these calculations. The max() can be done on integers. The differences can too. K can be calculated entirely with integer math.

J = max(RGB)
K = 255 - J
C = 255 * (J - R) / J
M = 255 * (J - G) / J
Y = 255 * (J - B) / J
Numba
import numba

@numba.njit(parallel=True, error_model="numpy", fastmath=True)
def bgr2cmyk_v4(bgr_img):
    bgr_img = np.ascontiguousarray(bgr_img)
    (height, width) = bgr_img.shape[:2]
    CMYK = np.empty((height, width, 4), dtype=np.uint8)
    for i in numba.prange(height):
        for j in range(width):
            B,G,R = bgr_img[i,j] 
            J = max(R, G, B)
            K = np.uint8(255 - J)
            C = np.uint8(255 * (J - R) / J)
            M = np.uint8(255 * (J - G) / J)
            Y = np.uint8(255 * (J - B) / J)
            CMYK[i,j] = (C,M,Y,K)
    return CMYK

Numba will optimize that code beyond simply using numpy library routines. It will also do the parallelization as indicated. Choosing the numpy error model and allowing fastmath will cause division by zero to not throw an exception or warning, but also make the math a little faster.

On my computer, this performs roughly 40x faster than the code you show in your question (90 ms -> 2 ms ).

What else?

I'm not sure if numba leaves any floating point operations in there. The division is technically, by python semantics, a float division, but replacing that by // (integer division) makes it slower.

I think that numba/LLVM didn't apply SIMD here. Some investigation revealed that the Loop Vectorizer doesn't like any of the instances it was asked to consider.

An OpenCL kernel might be even faster. OpenCL can run on CPUs.

Numba can also use CUDA .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM