简体   繁体   中英

What is the fastest way to transpose and normalize the data in an ndarray containing multiple images?

I have a 'batch' of images, usually 128 that are initially read into a numpy array of dimensions 128x360x640x3. I need to transpose each image from NHWC to NCHW, thus an operation of ndarray.transpose(2,0,1) and also normalize the pixels to a [0,1] range, thus I need to divide the array by 255. This batch processing operation will be repeated periodically perhaps a hundred or so times. The simplest implementation for this looks like this:

for i in range(128):
    batchImageDataNew[i,:,:] = batchImageData[i,:,:].transpose(2,0,1)/255.

batchImageDataNew is of type np.float32 whereas batchImageData is np.uint8. I am trying to speed this process up as much as possible. I thought ndarray.transpose only rearranges the strides without actually touching the memory, but I see approximately ~1 ms per image for just the transpose (120ms in total). On the other hand, doing both the transpose and the division brings up the total time to about 350ms. What would be the best way to speed this up as much as possible? Would a combination of Cython and multi(threading?)processing help? I am working on Ubuntu where I have access to OpenMP as well.

EDIT: I tried a simple multiprocessing.Pool implementation which gave me about 270ms for the whole loop, but I'd like to optimize it even further.

def preprocess(i):
    batchImageDataNew[i,:,:] = batchImageData[i,:,:].transpose(2,0,1)/255.


pool = multiprocessing.Pool(8)
pool.map(preprocess, range(128))

Fake data

a = np.array([[[1,1]],[[2,2]],[[3,3]]])
b = a + 10
c = b + 10
d = c + 10
e = np.stack((a,b,c,d))

Usually better to avoid for loops if you can and operate on the whole array

f = np.transpose(e, (0,3,1,2))
g = f / 255

>>> e.shape
(4, 3, 1, 2)
>>> f.shape
(4, 2, 3, 1)

Or np.moveaxis instead of transpose

f = np.moveaxis(e, 3, 1)
f = np.moveaxis(e, (1,2,3), (2,3,1))

A slight ~25% improvement (on my machine) can be achieved by creating an array beforehand to accept the result of the division:

a = np.array(np.random.rand(128,360,640,3)*255,dtype=np.uint8)
b = np.zeros((128,3,360,640), dtype=np.float32)
np.divide(np.moveaxis(a, (1,2,3), (2,3,1)), 255, out=b)

Your problem is highly memory and cache dependend. The optimal solution will depend on your processor and RAM-speed. This is a solution using Numba, but you can do a quite similar aproach using cython.

Example

import numba as nb
import numpy as np
import time


def tran_scal(batchImageData):
  s=batchImageData.shape
  batchImageDataNew=np.empty((s[0],s[3],s[1],s[2]),dtype=np.float32)
  for i in range(batchImageData.shape[0]):
    batchImageDataNew[i,:,:] = batchImageData[i,:,:].transpose(2,0,1)/255.
  return batchImageDataNew


@nb.njit()
def tran_scal_nb(batchImageData):
  s=batchImageData.shape
  batchImageDataNew=np.empty((s[0],s[3],s[1],s[2]),dtype=np.float32)
  for i in range(batchImageData.shape[0]):
    for j in range(batchImageData.shape[1]):
      for k in range(batchImageData.shape[2]):
        for l in range(batchImageData.shape[3]):
          batchImageDataNew[i,l,j,k] = batchImageData[i,j,k,l]*(1/255.)
  return batchImageDataNew

@nb.njit(parallel=True)
def tran_scal_nb_p(batchImageData):
  s=batchImageData.shape
  batchImageDataNew=np.empty((s[0],s[3],s[1],s[2]),dtype=np.float32)
  for i in nb.prange(batchImageData.shape[0]):
    for j in range(batchImageData.shape[1]):
      for k in range(batchImageData.shape[2]):
        for l in range(batchImageData.shape[3]):
          batchImageDataNew[i,l,j,k] = batchImageData[i,j,k,l]*(1/255.)
  return batchImageDataNew

Timings

Core i7-4xxx
#Test data
data=np.array(np.random.rand(128,360,640,3)*255,dtype=np.uint8)
Your solution:    550ms
@wwii(transpose): 379ms
tran_scal_nb:     190ms 
tran_scal_nb_p:   100ms 

On the first call there is a compilation overhead of about 0.5s, which is not included in the timings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM