简体   繁体   English

使用NumPy将ubyte [0,255]数组转换为float数组[-0.5,+0.5]的最快方法

[英]Fastest way to convert ubyte [0, 255] array to float array [-0.5, +0.5] with NumPy

The question is in the title and it is pretty straightforward. 问题在标题中,非常简单。

I have a file f from which I am reading a ubyte array: 我有一个文件f ,正在读取一个ubyte数组:

arr = numpy.fromfile(f, '>u1', size * rows * cols).reshape((size, rows, cols))
max_value = 0xFF  # max value of ubyte

Currently I'm renormalizing the data in 3 passes, as follows: 目前,我正在通过3次传递对数据进行规范化,如下所示:

arr = images.astype(float)
arr -= max_value / 2.0
arr /= max_value

Since the array is somewhat large, this takes a noticeable fraction of a second. 由于阵列有些大,因此需要花费很短的时间。
It would be great if I could do this in 1 or 2 passes through the data, as I think that would be faster. 如果我可以在1或2次传递数据中做到这一点,那就太好了,因为我认为这样会更快。

Is there some way for me to perform a "composite" vector operation to decrease the number of passes? 我可以通过某种方式执行“复合”矢量运算来减少通过次数吗?
Or, is there some other way for me to speed this up? 或者,我还有其他方法可以加快速度吗?

I did: 我做了:

ar = ar - 255/2.
ar *= 1./255

Seems faster :) 似乎更快:)

No I timed it, it's roughly twice as fast on my system. 不,我没有定时,它在我的系统上大约快两倍。 It seems ar = ar - 255/2. 看来ar = ar - 255/2. does subtraction and type conversion on the fly. 快速进行减法和类型转换。 Also, it seems division with a scalar is not optimized: it's faster to do the division once and then a bunch of multiplications on the array. 另外,似乎没有对标量进行除法优化:一次除法然后对数组进行一堆乘法会更快。 Though the additional floating point operation may increase round-off error. 尽管额外的浮点运算可能会增加舍入误差。

As noted in the comments, numexpr might be a truly fast yet simple way to achieve this. 如评论中所述, numexpr可能是实现这一目标的真正快速而简单的方法。 On my system it's another factor two quicker, but mostly due to numexpr using multiple cores and not so much the fact it does only a single pass over the array. 在我的系统上,这是另一个快两倍的因素,但这主要是由于numexpr使用了多个内核,而不是因为它只对数组执行了一次传递。 Code: 码:

import numexpr
ar = numexpr.evaluate('(ar - 255.0/2.0) / 255.0')

This lookup table might be a bit faster than the repeated calculation: 该查询表可能比重复计算要快一点:

table = numpy.linspace(-0.5, 0.5, 256)
images = numpy.memmap(f, '>u1', 'r', shape=(size, rows, cols))
arr = table[images]

On my system, it shaves 10 to 15 percent off the time compared to yours. 在我的系统上,与您的系统相比,它节省了10%到15%的时间。

I found a better solution myself (around 25% faster): 我自己找到了一个更好的解决方案(速度提高了约25%):

arr = numpy.memmap(f, '>u1', 'r', shape=(size, rows, cols))
arr = arr / float(max_value)
arr -= 0.5

I'm curious if it can be improved. 我很好奇是否可以改善。

I get like 50% speed up for large arrays using cython.parallel.prange with below code (done for one-dimensional array, but easily extensible); 我使用cython.parallel.prange和下面的代码,对大型数组的速度提高了50%(对一维数组,但很容易扩展); I guess the speed-up depends on the number of CPU cores: 我想速度的提高取决于CPU内核的数量:

pilot.pyx file: pilot.pyx文件:

cimport cython
from cython.parallel import prange
import numpy as np
cimport numpy as np
from numpy cimport float64_t, uint8_t, ndarray

@cython.boundscheck(False)
@cython.wraparound(False)
def norm(np.ndarray[uint8_t, ndim=1] img):
    cdef:
        Py_ssize_t i, n = len(img)
        np.ndarray[float64_t, ndim=1] arr = np.empty(n, dtype='float64')
        float64_t * left = <float64_t *> arr.data
        uint8_t * right = <uint8_t *> img.data

    for i in prange(n, nogil=True):
        left[i] = (right[i] - 127.5) / 255.0

    return arr

setup.py file to build a C extension module out of above code: setup.py文件,用上述代码构建C扩展模块:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_module = Extension(
    'pilot',
    ['pilot.pyx'],
    extra_compile_args=['-fopenmp'],
    extra_link_args=['-fopenmp'],
)

setup(
    name = 'pilot',
    cmdclass = {'build_ext': build_ext},
    ext_modules = [ext_module],
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM