简体   繁体   中英

Fastest way to convert ubyte [0, 255] array to float array [-0.5, +0.5] with NumPy

The question is in the title and it is pretty straightforward.

I have a file f from which I am reading a ubyte array:

arr = numpy.fromfile(f, '>u1', size * rows * cols).reshape((size, rows, cols))
max_value = 0xFF  # max value of ubyte

Currently I'm renormalizing the data in 3 passes, as follows:

arr = images.astype(float)
arr -= max_value / 2.0
arr /= max_value

Since the array is somewhat large, this takes a noticeable fraction of a second.
It would be great if I could do this in 1 or 2 passes through the data, as I think that would be faster.

Is there some way for me to perform a "composite" vector operation to decrease the number of passes?
Or, is there some other way for me to speed this up?

I did:

ar = ar - 255/2.
ar *= 1./255

Seems faster :)

No I timed it, it's roughly twice as fast on my system. It seems ar = ar - 255/2. does subtraction and type conversion on the fly. Also, it seems division with a scalar is not optimized: it's faster to do the division once and then a bunch of multiplications on the array. Though the additional floating point operation may increase round-off error.

As noted in the comments, numexpr might be a truly fast yet simple way to achieve this. On my system it's another factor two quicker, but mostly due to numexpr using multiple cores and not so much the fact it does only a single pass over the array. Code:

import numexpr
ar = numexpr.evaluate('(ar - 255.0/2.0) / 255.0')

This lookup table might be a bit faster than the repeated calculation:

table = numpy.linspace(-0.5, 0.5, 256)
images = numpy.memmap(f, '>u1', 'r', shape=(size, rows, cols))
arr = table[images]

On my system, it shaves 10 to 15 percent off the time compared to yours.

I found a better solution myself (around 25% faster):

arr = numpy.memmap(f, '>u1', 'r', shape=(size, rows, cols))
arr = arr / float(max_value)
arr -= 0.5

I'm curious if it can be improved.

I get like 50% speed up for large arrays using cython.parallel.prange with below code (done for one-dimensional array, but easily extensible); I guess the speed-up depends on the number of CPU cores:

pilot.pyx file:

cimport cython
from cython.parallel import prange
import numpy as np
cimport numpy as np
from numpy cimport float64_t, uint8_t, ndarray

@cython.boundscheck(False)
@cython.wraparound(False)
def norm(np.ndarray[uint8_t, ndim=1] img):
    cdef:
        Py_ssize_t i, n = len(img)
        np.ndarray[float64_t, ndim=1] arr = np.empty(n, dtype='float64')
        float64_t * left = <float64_t *> arr.data
        uint8_t * right = <uint8_t *> img.data

    for i in prange(n, nogil=True):
        left[i] = (right[i] - 127.5) / 255.0

    return arr

setup.py file to build a C extension module out of above code:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_module = Extension(
    'pilot',
    ['pilot.pyx'],
    extra_compile_args=['-fopenmp'],
    extra_link_args=['-fopenmp'],
)

setup(
    name = 'pilot',
    cmdclass = {'build_ext': build_ext},
    ext_modules = [ext_module],
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM