自定义 CoreML output 层，对 multiArray output 求和

Question

Please bear with me.请多多包涵。 I'm new to CoreML and machine learning.我是 CoreML 和机器学习的新手。 I have a CoreML model that I was able to convert from a research paper implementation that used Caffe.我有一个 CoreML model，我可以从使用 Caffe 的研究论文实现中转换它。 It's a CSRNet, the objective being crowd-counting.这是一个 CSRNet，目标是人群计数。 After much wrangling, I'm able to load the MLmodel into Python using Coremltools, pre-process an image using Pillow and predict an output.经过一番争论，我能够使用 Coremltools 将 MLmodel 加载到 Python 中，使用 Pillow 预处理图像并预测 output。 The result is a MultiArray (from a density map), which I've then processed further to derive the actual numerical prediction.结果是一个 MultiArray（来自密度图），然后我对其进行了进一步处理以得出实际的数值预测。

How do I add a custom layer as an output to the model that takes the current output and performs the following functionality?如何将自定义层作为 output 添加到采用当前 output 并执行以下功能的 model 中？ I've read numerous articles, and am still at a loss.我已经阅读了许多文章，但仍然不知所措。 (Essentially, it sums the values all the values in the MultiArray) I'd like to be able to save the model/ layer and import it into Xcode so that the MLModel result is a single numerical value, and not a MultiArray. （本质上，它将 MultiArray 中所有值的值相加）我希望能够保存模型/层并将其导入 Xcode 以便 MLModel 结果是单个数值，而不是 MultiArray。

This is the code I'm currently using to convert the output from the model into a number (in Python):这是我目前用来将 output 从 model 转换为数字（在 Python 中）的代码：

# predict output

output = model.predict({'data': img})
summed_output = sum(output.values())
prediction = np.sum(summed_output)
print("prediction: ", prediction)

Full (abbreviated) code:完整（缩写）代码：

import coremltools as ct
from PIL import Image
import numpy as np

# instantiate model (CSRNet)
model = ct.models.MLModel('shanghai_b.mlmodel')

# function to resize image
def load_image(path, resize_to=None):
    img = PIL.Image.open(path)
    if resize_to is not None:
        img = img.resize(resize_to, PIL.Image.ANTIALIAS)
    img_np = np.array(img).astype(np.float32)
    return img_np, img

# select image 
image = 'IMG_173.jpg'

#resize image
_, img = load_image(image, resize_to=(900, 675)) 

# predict output
output = model.predict({'data': img})
summed_output = sum(output.values())
prediction = np.sum(summed_output)
print("prediction: ", prediction)

Xcode shows the output for the MLModel as being: "MultiArray (Double 1 x 168 x 225)". Xcode 将 MLModel 的 output 显示为：“MultiArray (Double 1 x 168 x 225)”。 The spec description for the same model as it currently stands when I import it into python using Coremltools is as follows:当我使用 Coremltools 将其导入 python 时，当前相同的 model 的规格描述如下：

<bound method MLModel.predict of input {
  name: "data"
  type {
    imageType {
      width: 224
      height: 224
      colorSpace: RGB
    }
  }
}
output {
  name: "estdmap"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
>

Thanks for any help.谢谢你的帮助。 I'm happy to post any other code in the process if it's useful.如果有用，我很乐意在此过程中发布任何其他代码。

PS I'm adding the code I have inside my Xcode project as a reference as well. PS 我正在添加我在 Xcode 项目中的代码作为参考。

private func detectImage(_ image: CIImage) {

        guard let model = try? VNCoreMLModel(for: HundredsPredictor().model) else {
            fatalError("Loading to CoreML failed") }
        
        let modelRequest = VNCoreMLRequest(model: model) { (request, error) in
            if error != nil {
                print(error?.localizedDescription ?? "Error")
            } else {
                guard let result = request.results as? [VNObservation] else {fatalError("Error")}
                
                if #available(iOS 14.0, *) {
                    print(result)
                    
                    // output: [<VNCoreMLFeatureValueObservation: 0x282069da0> 344A87BC-B13E-4195-922E-7381694C91FF requestRevision=1 confidence=1.000000 timeRange={{0/1 = 0.000}, {0/1 = 0.000}} "density_map" - "MultiArray: Double 1 × 168 × 225 array" (1.000000)]
                    
                } else {
                    // Fallback on earlier versions
                }
                if let firstResult = result.first {
                    print(firstResult)

                    // output: [<VNCoreMLFeatureValueObservation: 0x282069da0> 344A87BC-B13E-4195-922E-7381694C91FF requestRevision=1 confidence=1.000000 timeRange={{0/1 = 0.000}, {0/1 = 0.000}} "density_map" - "MultiArray : Double 1 × 168 × 225 array" (1.000000)]

                }
            }
        }

        let handler = VNImageRequestHandler(ciImage: image)
        do {
            try handler.perform([modelRequest])
            print(handler)
        }
        catch let error as NSError {
            print(error)
        }
    }

Update: Solution更新：解决方案

In python:在 python 中：

from helpers import get_nn
# helper file sourced from Matthijs Hollemans github
# url: https://github.com/hollance/coreml-survival-guide/blob/master/Scripts/helpers.py

# load original model
spec = ct.utils.load_spec("HundredsPredictor.mlmodel")

nn = get_nn(spec)

#construct new layer
new_layer = nn.layers.add()
new_layer.name = "summingLayer"

params = ct.proto.NeuralNetwork_pb2.ReduceLayerParams
new_layer.reduce.mode = params.SUM
new_layer.reduce.axis = params.CHW

# append new layer to model
new_layer.output.append(nn.layers[-2].output[0])
nn.layers[-2].output[0] = nn.layers[-2].name + "_output"
new_layer.input.append(nn.layers[-2].output[0])

spec.description.output[0].type.multiArrayType.shape[0] = 1

# save new model
ct.models.utils.save_spec(spec, "HundredPredictorSummed.mlmodel")

In Swift, after importing the new updated model:在 Swift 中，导入新更新的 model 后：

private func detectImage(_ image: CIImage) {
        
        guard let model = try? VNCoreMLModel(for: HundredPredictorSummed().model) else {
            fatalError("Loading to CoreML failed") }
        
        let request = VNCoreMLRequest(model: model) { [weak self] request, error in
            guard let results = request.results as? [VNCoreMLFeatureValueObservation],
                  let topResult = results.first else {
                fatalError("Unexpected result type from VNCoreMLRequest")}
            
            DispatchQueue.main.async {

                guard let data = topResult.featureValue.multiArrayValue else { return }

                let ptr = data.dataPointer.assumingMemoryBound(to: Double.self)
                let sum = ptr[0]
                print("SUM: ", sum)
                
                self?.detectLabel.text = "~\(String(Int(round(sum)))) ppl"

            }
            
        }
        
        let handler = VNImageRequestHandler(ciImage: image)
        
        DispatchQueue.global(qos: .userInteractive).async {
            do {
                try handler.perform([request])
            } catch {
                print(error)
        
            }
        }
        
    }

Answer 1

You can add a ReduceSumLayerParams to the end of the model.您可以将 ReduceSumLayerParams 添加到 model 的末尾。 You'll need to do this in Python by hand.您需要在 Python 中手动执行此操作。 If you set its reduceAll parameter to true, it will compute the sum over the entire tensor.如果将其 reduceAll 参数设置为 true，它将计算整个张量的总和。

However, in my opinion, it's just as easy to use the model as-is, and in your Swift code grab a pointer to the MLMultiArray's data and use vDSP.sum(a) to compute the sum.但是，在我看来，按原样使用 model 和在 Swift 代码中获取指向 MLMultiArray 数据的指针并使用vDSP.sum(a)来计算总和一样容易。

自定义 CoreML output 层，对 multiArray output 求和

问题描述

Update: Solution更新：解决方案

1 个解决方案

解决方案1
1 已采纳 2021-04-01 10:21:49

自定义 CoreML output 层，对 multiArray output 求和

问题描述

Update: Solution更新：解决方案

1 个解决方案

解决方案1 1 已采纳 2021-04-01 10:21:49

解决方案1
1 已采纳 2021-04-01 10:21:49