简体   繁体   中英

Perform normalization using Accelerate framework

I need to perform simple math operation on Data that contains RGB pixels data. Currently Im doing this like so:

let imageMean: Float = 127.5
let imageStd: Float = 127.5
let rgbData: Data // Some data containing RGB pixels 
let floats = (0..<rgbData.count).map {
    (Float(rgbData[$0]) - imageMean) / imageStd
}
return Data(bytes: floats, count: floats.count * MemoryLayout<Float>.size)

This works, but it's too slow. I was hoping I could use the Accelerate framework to calculate this faster, but have no idea how to do this. I reserved some space so that it's not allocated every time this function starts, like so:

inputBufferDataNormalized = malloc(width * height * 3) // 3 channels RGB

I tried few functions, like vDSP_vasm , but I couldn't make it work. Can someone direct me to how to use it? Basically I need to replace this map function, because it takes too long time. And probably it would be great to use pre-allocated space all the time.

Following up on my comment on your other related question. You can use SIMD to parallelize the operation, but you'd need to split the original array into chunks.

This is a simplified example that assumes that the array is exactly divisible by 64 , for example, an array of 1024 elements:

let arr: [Float] = (0 ..< 1024).map { _ in Float.random(in: 0...1) }
let imageMean: Float = 127.5
let imageStd: Float = 127.5

var chunks = [SIMD64<Float>]()
chunks.reserveCapacity(arr.count / 64)

for i in stride(from: 0, to: arr.count, by: 64) {
   let v = SIMD64.init(arr[i ..< i+64])

   chunks.append((v - imageMean) / imageStd) // same calculation using SIMD

}

You can now access each chunk with a subscript:

var results: [Float] = []
results.reserveCapacity(arr.count)

for chunk in chunks {
   for i in chunk.indices {
      results.append(chunk[i])
   }
}

Of course, you'd need to deal with a remainder if the array isn't exactly divisible by 64.

I have found a way to do this using Accelerate . First I reserve space for converted buffer like so

var inputBufferDataRawFloat = [Float](repeating: 0, count: width * height * 3)

Then I can use it like so:

let rawBytes = [UInt8](rgbData)
vDSP_vfltu8(rawBytes, 1, &inputBufferDataRawFloat, 1, vDSP_Length(rawBytes.count))
            
vDSP.add(inputBufferDataRawScalars.mean, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
vDSP.multiply(inputBufferDataRawScalars.std, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)

return Data(bytes: inputBufferDataRawFloat, count: inputBufferDataRawFloat.count * MemoryLayout<Float>.size)

Works very fast. Maybe there is better function in Accelerate , if anyone know of it, please let me know. It need to perform function (A[n] + B) * C (or to be exact (A[n] - B) / C but the first one could be converted to this).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM