简体   繁体   English

如何在Swift中标准化UIImage的像素值?

[英]How to normalize pixel values of an UIImage in Swift?

We are attempting to normalize an UIImage so that it can be passed correctly into a CoreML model. 我们正在尝试规范化UIImage以便可以将其正确传递到CoreML模型中。

The way we are retrieving the RGB values from each pixel is by first initializing a [CGFloat] array called rawData of values for each pixel such that there is a position for the colors Red, Green, Blue and the alpha value. 我们从每个像素中检索RGB值的方法是,首先为每个像素初始化一个名为rawData[CGFloat]数组,以获取每个像素的值,以便为红色,绿色,蓝色和alpha值定位。 In bitmapInfo , we get the raw pixel values from the original UIimage itself and conduct. bitmapInfo ,我们从原始UIimage本身获取原始像素值并进行操作。 This is used to fill the bitmapInfo paramter in context , a CGContext variable. 这用于填充contextCGContext变量中的bitmapInfo参数。 We will later used the context variable to draw a CGImage that will later convert the normalized CGImage back into a UIImage . 稍后,我们将使用context变量来draw CGImage ,然后将其标准化后的CGImage转换回UIImage

Using a nested for-loop iterating through x and y coordinates, the minimum and maximum pixel color values among all colors (found through the CGFloat 's raw data array) across all the pixels are found. 使用通过xy坐标进行嵌套的for循环迭代,可以找到所有像素中所有颜色(通过CGFloat的原始数据数组找到)中的最小和最大像素颜色值。 A bound variable is set to terminate the for loop, otherwise, it will has out of range error. 将绑定变量设置为终止for循环,否则它将出现超出范围的错误。

range indicates the range of possible RGB values (ie. the difference between the maximum color value and the minimum). range指示可能的RGB值的范围(即最大颜色值和最小颜色值之间的差)。

Using the equation to normalize each pixel value: 使用公式对每个像素值进行归一化:

A = Image
curPixel = current pixel (R,G, B or Alpha) 
NormalizedPixel = (curPixel-minPixel(A))/range

and a similar designed nested for loop from above to parse through the array of rawData and modify each pixel's colors according to this normalization. 以及从上方嵌套的for循环类似设计,可通过rawData数组进行解析,并根据此归一化修改每个像素的颜色。

Most of our codes are from: 我们的大多数代码来自:

  1. UIImage to UIColor array of pixel colors UIImage到像素颜色的UIColor数组
  2. Change color of certain pixels in a UIImage 更改UIImage中某些像素的颜色
  3. https://gist.github.com/pimpapare/e8187d82a3976b851fc12fe4f8965789 https://gist.github.com/pimpapare/e8187d82a3976b851fc12fe4f8965789

We use CGFloat instead of UInt8 because the normalized pixel values should be real numbers that between 0 and 1, not either 0 or 1. 我们使用CGFloat代替UInt8因为归一化的像素值应该是介于0和1之间的实数,而不是0或1。

func normalize() -> UIImage?{

    let colorSpace = CGColorSpaceCreateDeviceRGB()

    guard let cgImage = cgImage else {
        return nil
    }

    let width = Int(size.width)
    let height = Int(size.height)

    var rawData = [CGFloat](repeating: 0, count: width * height * 4)
    let bytesPerPixel = 4
    let bytesPerRow = bytesPerPixel * width
    let bytesPerComponent = 8

    let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Big.rawValue & CGBitmapInfo.alphaInfoMask.rawValue

    let context = CGContext(data: &rawData,
                            width: width,
                            height: height,
                            bitsPerComponent: bytesPerComponent,
                            bytesPerRow: bytesPerRow,
                            space: colorSpace,
                            bitmapInfo: bitmapInfo)

    let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
    context?.draw(cgImage, in: drawingRect)

    let bound = rawData.count

    //find minimum and maximum
    var minPixel: CGFloat = 1.0
    var maxPixel: CGFloat = 0.0

    for x in 0..<width {
        for y in 0..<height {

            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            minPixel = min(CGFloat(rawData[byteIndex]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 1]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 2]), minPixel)

            minPixel = min(CGFloat(rawData[byteIndex + 3]), minPixel)


            maxPixel = max(CGFloat(rawData[byteIndex]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 1]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 2]), maxPixel)

            maxPixel = max(CGFloat(rawData[byteIndex + 3]), maxPixel)
        }
    }

    let range = maxPixel - minPixel
    print("minPixel: \(minPixel)")
    print("maxPixel : \(maxPixel)")
    print("range: \(range)")

    for x in 0..<width {
        for y in 0..<height {
            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            rawData[byteIndex] = (CGFloat(rawData[byteIndex]) - minPixel) / range
            rawData[byteIndex+1] = (CGFloat(rawData[byteIndex+1]) - minPixel) / range
            rawData[byteIndex+2] = (CGFloat(rawData[byteIndex+2]) - minPixel) / range

            rawData[byteIndex+3] = (CGFloat(rawData[byteIndex+3]) - minPixel) / range

        }
    }

    let cgImage0 = context!.makeImage()
    return UIImage.init(cgImage: cgImage0!)
}

Before normalization, we expect the pixel values range is 0 - 255 and after normalization, the pixel values range is 0 - 1. 归一化之前,我们希望像素值范围为0-255,归一化之后,我们希望像素值范围为0-1。

The normalization formula is able to normalize pixel values to values between 0 and 1. But when we try to print out (simply add print statements when we loop through pixel values) the pixel values before normalization to verify we get the raw pixel values correct, we found out that the range of those values are off. 归一化公式能够将像素值归一化为介于0和1之间的值。但是,当我们尝试打印(在遍历像素值时只需添加打印语句)归一化之前的像素值,以验证我们获得的原始像素值正确无误,我们发现这些值的范围不正确。 For example, a pixel value have value as 3.506e+305 (larger than 255.) We think we get the raw pixel value wrong at the beginning. 例如,一个像素值的值为3.506e + 305(大于255)。我们认为我们在一开始就弄错了原始像素值。

We are not familiar with image processing in Swift and we are not sure if the whole normalization process is right. 我们不熟悉Swift中的图像处理,并且不确定整个规范化过程是否正确。 any help would be appreciated! 任何帮助,将不胜感激!

A couple of observations: 一些观察:

  1. Your rawData is floating point, CGFloat , array, but your context isn't populating it with floating point data, but rather with UInt8 data. 您的rawData是浮点数CGFloat数组,但是您的上下文中没有填充浮点数据,而是UInt8数据。 If you want a floating point buffer, build a floating point context with CGBitmapInfo.floatComponents and tweak the context parameters accordingly. 如果需要浮点缓冲区,请使用CGBitmapInfo.floatComponents构建浮点上下文,并相应地调整上下文参数。 Eg: 例如:

     func normalize() -> UIImage? { let colorSpace = CGColorSpaceCreateDeviceRGB() guard let cgImage = cgImage else { return nil } let width = cgImage.width let height = cgImage.height var rawData = [Float](repeating: 0, count: width * height * 4) let bytesPerPixel = 16 let bytesPerRow = bytesPerPixel * width let bitsPerComponent = 32 let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.floatComponents.rawValue | CGBitmapInfo.byteOrder32Little.rawValue guard let context = CGContext(data: &rawData, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else { return nil } let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height)) context.draw(cgImage, in: drawingRect) var maxValue: Float = 0 var minValue: Float = 1 for pixel in 0 ..< width * height { let baseOffset = pixel * 4 for offset in baseOffset ..< baseOffset + 3 { let value = rawData[offset] if value > maxValue { maxValue = value } if value < minValue { minValue = value } } } let range = maxValue - minValue guard range > 0 else { return nil } for pixel in 0 ..< width * height { let baseOffset = pixel * 4 for offset in baseOffset ..< baseOffset + 3 { rawData[offset] = (rawData[offset] - minValue) / range } } return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) } } 
  2. But this begs the question of why you'd bother with floating point data. 但这引出了一个问题,为什么您要打扰浮点数据。 If you were returning this floating point data back to your ML model, then I can imagine it might be useful, but you're just creating a new image. 如果将这个浮点数据返回到ML模型,那么我可以想象它可能会有用,但是您只是在创建一个新图像。 Because of that, you also have to opportunity to just retrieve the UInt8 data, do the floating point math, and then update the UInt8 buffer, and create the image from that. 因此,您还必须有机会仅检索UInt8数据,进行浮点运算,然后更新UInt8缓冲区并从中创建图像。 Thus: 从而:

     func normalize() -> UIImage? { let colorSpace = CGColorSpaceCreateDeviceRGB() guard let cgImage = cgImage else { return nil } let width = cgImage.width let height = cgImage.height var rawData = [UInt8](repeating: 0, count: width * height * 4) let bytesPerPixel = 4 let bytesPerRow = bytesPerPixel * width let bitsPerComponent = 8 let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue guard let context = CGContext(data: &rawData, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo) else { return nil } let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height)) context.draw(cgImage, in: drawingRect) var maxValue: UInt8 = 0 var minValue: UInt8 = 255 for pixel in 0 ..< width * height { let baseOffset = pixel * 4 for offset in baseOffset ..< baseOffset + 3 { let value = rawData[offset] if value > maxValue { maxValue = value } if value < minValue { minValue = value } } } let range = Float(maxValue - minValue) guard range > 0 else { return nil } for pixel in 0 ..< width * height { let baseOffset = pixel * 4 for offset in baseOffset ..< baseOffset + 3 { rawData[offset] = UInt8(Float(rawData[offset] - minValue) / range * 255) } } return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) } } 

    I just depends upon whether you really needed this floating point buffer for your ML model (in which case, you might return the array of floats in the first example, rather than creating a new image) or whether the goal was just to create the normalized UIImage . 我仅取决于您是否真的为ML模型需要此浮点缓冲区(在这种情况下,您可能在第一个示例中返回浮点数组,而不是创建新图像)还是目标只是创建规范化对象? UIImage

    I benchmarked this, and it was a tad faster on iPhone XS Max than the floating point rendition, but takes a quarter of the the memory (eg a 2000×2000px image takes 16mb with UInt8 , but 64mb with Float ). 我对此进行了基准测试,它在iPhone XS Max上比浮点渲染快了一点,但占用了四分之一的内存(例如,使用UInt8拍摄2000×2000px图像需要UInt8 ,而使用Float需要64mb)。

  3. Finally, I should mention that vImage has a highly optimized function, vImageContrastStretch_ARGB8888 that does something very similar to what we've done above. 最后,我应该提到vImage具有高度优化的功能,即vImageContrastStretch_ARGB8888 ,其功能与我们之前所做的非常相似。 Just import Accelerate and then you can do something like: 只需import Accelerate ,然后您可以执行以下操作:

     func normalize3() -> UIImage? { let colorSpace = CGColorSpaceCreateDeviceRGB() guard let cgImage = cgImage else { return nil } var format = vImage_CGImageFormat(bitsPerComponent: UInt32(cgImage.bitsPerComponent), bitsPerPixel: UInt32(cgImage.bitsPerPixel), colorSpace: Unmanaged.passRetained(colorSpace), bitmapInfo: cgImage.bitmapInfo, version: 0, decode: nil, renderingIntent: cgImage.renderingIntent) var source = vImage_Buffer() var result = vImageBuffer_InitWithCGImage( &source, &format, nil, cgImage, vImage_Flags(kvImageNoFlags)) guard result == kvImageNoError else { return nil } defer { free(source.data) } var destination = vImage_Buffer() result = vImageBuffer_Init( &destination, vImagePixelCount(cgImage.height), vImagePixelCount(cgImage.width), 32, vImage_Flags(kvImageNoFlags)) guard result == kvImageNoError else { return nil } result = vImageContrastStretch_ARGB8888(&source, &destination, vImage_Flags(kvImageNoFlags)) guard result == kvImageNoError else { return nil } defer { free(destination.data) } return vImageCreateCGImageFromBuffer(&destination, &format, nil, nil, vImage_Flags(kvImageNoFlags), nil).map { UIImage(cgImage: $0.takeRetainedValue(), scale: scale, orientation: imageOrientation) } } 

    While this employs a slightly different algorithm, it's worth considering, because in my benchmarking, on my iPhone XS Max it was over 5 times as fast as the floating point rendition. 尽管此算法使用的算法略有不同,但值得考虑,因为在我的基准测试中,在我的iPhone XS Max上,它的速度是浮点表示法的5倍以上。


A few unrelated observations: 一些无关的发现:

  1. Your code snippet is normalizing the alpha channel, too. 您的代码段也在规范化alpha通道。 I'm not sure you'd want to do that. 我不确定您是否要这样做。 Usually colors and alpha channels are independent. 通常,颜色和Alpha通道是独立的。 Above I assume you really wanted to normalize just the color channels. 在上面,我假设您确实只想对颜色通道进行标准化。 If you want to normalize alpha channel, too, then you might have a separate min-max range of values for alpha channels and process that separately. 如果您也想规范化Alpha通道,那么您可能对Alpha通道有一个单独的min-max值范围,并分别进行处理。 But it doesn't make much sense to normalize alpha channel with the same range of values as for the color channels (or vice versa). 但是使用与色彩通道相同的值范围来标准化Alpha通道并没有多大意义(反之亦然)。

  2. Rather than using the UIImage width and height, I'm using the values from the CGImage . 我不是使用UIImage宽度和高度,而是使用CGImage的值。 This is important distinction in case your images might not have a scale of 1. 如果图像的比例尺可能不为1,这是重要的区别。

  3. You might want to consider early-exit if, for example, the range was already 0-255 (ie no normalization needed). 例如,如果范围已经为0-255(即无需标准化),则可能要考虑提前退出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM