使用多维数组作为 CoreML 模型输出

Question

I have trained an object detection CoreML model using Microsoft's customvision.ai service.我已经使用 Microsoft 的 customvision.ai 服务训练了一个对象检测CoreML 模型。 I exported it to use in my app to recognize certain objects in real time using the camera.我将它导出以在我的应用程序中使用，以使用相机实时识别某些对象。 However the CoreML model outputs a MultiArray of type double.然而，CoreML 模型输出一个 double 类型的 MultiArray。 I have no idea how to decipher or use this data as it is my first time working with Multidimensional Arrays.我不知道如何破译或使用这些数据，因为这是我第一次使用多维数组。 I have been trying to find out what a custom vision object detection model is supposed to output (such as a CGRect or a UIImage) so I know what I am trying to convert my MultiArray to, but cannot find this information anywhere on Microsofts's website.我一直试图找出自定义视觉对象检测模型应该输出什么（例如 CGRect 或 UIImage），所以我知道我想将 MultiArray 转换为什么，但在 Microsoft 网站上的任何地方都找不到此信息。 Microsoft seems to have a demo app for image classification models but nothing for object detection models.微软似乎有一个用于图像分类模型的演示应用程序，但没有用于对象检测模型。

To get a sense of what might be in the multidimensional array I have tried printing it out and get this result...为了了解多维数组中可能存在的内容，我尝试将其打印出来并得到这个结果......

Double 1 x 1 x 40 x 13 x 13 array

I have also tried printing the .strides element of the multidimensional array and got this...我也试过打印多维数组的.strides元素并得到这个......

[6760, 6760, 169, 13, 1]

I don't know if this info is actually useful, just wanted to give you guys everything I have done so far.我不知道这些信息是否真的有用，只是想给你们我到目前为止所做的一切。

So, my question is what information does this MultiArray hold (is it something like a UIImage or CGRect?, or something different?) and how can I convert this Multidimensional Array into a useful set of data that I can actually use?所以，我的问题是这个 MultiArray 包含哪些信息（它是像 UIImage 还是 CGRect 之类的东西？还是不同的东西？）以及如何将这个多维数组转换为我可以实际使用的有用数据集？

Answer 1

I haven't used the customvision.ai service, but I've worked with object detection models before.我没有使用 customvision.ai 服务，但我以前使用过对象检测模型。 The 13x13 array is most likely a grid that covers the input image. 13x13 数组很可能是一个覆盖输入图像的网格。 For each cell in this array -- usually corresponding to a block of 32x32 pixels in the original image -- there is a prediction of 40 numbers.对于这个数组中的每个单元格——通常对应于原始图像中 32x32 像素的块——有 40 个数字的预测。

It depends a little on what sort of model customvision.ai uses what those 40 numbers mean.这在一定程度上取决于 customvision.ai 使用这 40 个数字的含义的模型类型。 But typically they contain coordinates for one or more bounding boxes as well as class probabilities.但通常它们包含一个或多个边界框的坐标以及类别概率。

In case the model is YOLO (which seems likely, as that also has a 13x13 output grid) there are multiple predictions per cell.如果模型是 YOLO（这似乎很可能，因为它也有一个 13x13 的输出网格）每个单元格有多个预测。 Each prediction has 4 numbers to describe a bounding box, 1 number to describe the probability this bounding box contains an object, and num_classes numbers with the probabilities for the different classes.每个预测有 4 个数字来描述一个边界框，1 个数字来描述这个边界框包含一个对象的概率，以及num_classes数字，其中包含不同类别的概率。

So there are (5 + num_classes) x num_predictions numbers per grid cell.所以每个网格单元有(5 + num_classes) x num_predictions数字。 If the model makes 5 predictions per grid cell and you have trained on 3 classes, you get (5 + 3)*5 = 40 numbers per grid cell.如果模型对每个网格单元进行 5 次预测，并且您已经训练了 3 个类别，则每个网格单元会得到(5 + 3)*5 = 40数字。

Note that I'm making a lot of assumptions here because I don't know anything about your model type and how many classes of objects you trained on.请注意，我在这里做了很多假设，因为我对您的模型类型以及您训练的对象类别一无所知。

Those 40 numbers may or may not have real bounding box coordinates yet.这 40 个数字可能还没有真正的边界框坐标。 You may need to write additional code to "decode" these numbers.您可能需要编写额外的代码来“解码”这些数字。 Again, the logic for this depends on the model type.同样，此逻辑取决于模型类型。

I'd assume customvision.ai has some documentation or sample code on how to do this.我假设 customvision.ai 有一些关于如何做到这一点的文档或示例代码。

You can also read more about this topic in several of my blog posts:您还可以在我的几篇博文中阅读有关此主题的更多信息：

Answer 2

9 months later, I stumbled upon your question while trying to solve this exact problem. 9 个月后，我在尝试解决这个确切的问题时偶然发现了你的问题。 Having found the solution today, I thought I'd post it up.今天找到解决方案后，我想我会发布它。

Have a look at this github sample.看看这个 github 示例。

https://github.com/Azure-Samples/cognitive-services-ios-customvision-sample/tree/master/CVS_ObjectDetectorSample_Swift https://github.com/Azure-Samples/cognitive-services-ios-customvision-sample/tree/master/CVS_ObjectDetectorSample_Swift

It makes use of a Cocoapod named MicrosoftCustomVisionMobile.它使用名为 MicrosoftCustomVisionMobile 的 Cocoapod。

That cocoapod contains the CVSInference framework, which has a class, CVSObjectDetector, that will do all the heavy lifting of parsing the 3-dimensional MLMultiArray output for you.该 cocoapod 包含 CVSInference 框架，该框架具有一个类 CVSObjectDetector，它将为您完成解析 3 维 MLMultiArray 输出的所有繁重工作。 All you need to do is feed it the UIImage for detection and run the inference.您需要做的就是将 UIImage 提供给它进行检测并运行推理。 Then, you can read the detected identifiers, their bounding boxes and confidences using the strongly typed properties of CVSObjectDetector.然后，您可以使用 CVSObjectDetector 的强类型属性读取检测到的标识符、它们的边界框和置信度。 Make sure you transform the coordinates back to your view space before drawing!确保在绘制之前将坐标转换回您的视图空间！

If you are working in Xamarin like me, you could use sharpie to create a C# binding for the pod and you'll be in business.如果您像我一样在 Xamarin 中工作，则可以使用 sharpie 为 pod 创建 C# 绑定，然后您就可以开展业务了。

Answer 3

That's quite a late answer but I faced the same issue and that's my solution.这是一个很晚的答案，但我遇到了同样的问题，这就是我的解决方案。 You should get your prediction with something similar:你应该得到类似的预测：

guard let modelOutput = try? model.prediction(input: modelInput) else {
    fatalError("Unexpected runtime error.")
}

Then, based on your output name defined in your model (here the name is "Identity"):然后，根据模型中定义的输出名称（此处名称为“身份”）：

You should be able to access the data in the Multidimensional Array like so:您应该能够像这样访问多维数组中的数据：

for i in 0..<apoiOutput.Identity.count {
    print(modelOutput.Identity[i].floatValue)
}

使用多维数组作为 CoreML 模型输出

问题描述

3 个解决方案

解决方案1
1 2019-01-20 17:29:56

解决方案2
1 已采纳 2020-08-27 21:25:06

解决方案3
0 2021-05-26 22:30:10

使用多维数组作为 CoreML 模型输出

问题描述

3 个解决方案

解决方案1 1 2019-01-20 17:29:56

解决方案2 1 已采纳 2020-08-27 21:25:06

解决方案3 0 2021-05-26 22:30:10

解决方案1
1 2019-01-20 17:29:56

解决方案2
1 已采纳 2020-08-27 21:25:06

解决方案3
0 2021-05-26 22:30:10