简体   繁体   English

为具有灵活形状的 CoreML 2 model 指定输入/输出尺寸

[英]Specifying Input/Output dimensions for CoreML 2 model with Flexible Shapes

I managed to create a CoreML 2.0 model with flexible input/output shape sizes:我设法创建了一个具有灵活输入/输出形状大小的 CoreML 2.0 model:

在此处输入图像描述

I can't figure out how to set the size in my Xcode project, however.但是,我不知道如何在我的 Xcode 项目中设置大小。 If I set the input pixel buffer size 2048x2048, the output pixel buffer is still 1536x1536.如果我将输入像素缓冲区大小设置为 2048x2048,则 output 像素缓冲区仍为 1536x1536。 If I set it to 768x768, the resulting pixel buffer is still 1536x1536 - but is blank outside the region of 768x768.如果我将其设置为 768x768,则生成的像素缓冲区仍然是 1536x1536 - 但在 768x768 区域之外是空白的。

I examined the automatically generated Swift model class and don't see any clues there.我检查了自动生成的 Swift model class 并没有看到任何线索。

I can't find a single example anywhere showing how to use the "Flexibility" sizes.我在任何地方都找不到显示如何使用“灵活性”尺寸的单个示例。

In the WWDC 2018 Session 708 "What's New in Core ML", Part 1 it states:在 WWDC 2018 Session 708“Core ML 中的新增功能”第 1 部分中指出:

This means that now you have to ship a single model.这意味着现在您必须运送一个 model。 You don't have to have any redundant code.您不必有任何冗余代码。 And if you need to switch between standard definition and high definition, you can do it much faster because we don't need to reload the model from scratch;如果您需要在标清和高清之间切换,您可以更快地完成,因为我们不需要从头开始重新加载 model; we just need to resize it.我们只需要调整它的大小。 You have two options to specify the flexibility of the model.您有两个选项来指定 model 的灵活性。 You can define a range for its dimension, so you can define a minimal width and height and the maximum width and height.您可以为其尺寸定义一个范围,因此您可以定义最小宽度和高度以及最大宽度和高度。 And then at inference pick any value in between.然后在推断时选择介于两者之间的任何值。 But there is also another way.但还有另一种方式。 You can enumerate all the shapes that you are going to use.您可以枚举您将要使用的所有形状。 For example, all different aspect ratios, all different resolutions, and this is better for performance.例如,所有不同的纵横比,所有不同的分辨率,这对性能更好。 Core ML knows more about your use case earlier, so it can -- it has the opportunities of performing more optimizations. Core ML 更早地了解您的用例,因此它可以——它有机会执行更多优化。

They say "we just need to resize it".他们说“我们只需要调整它的大小”。 It so frustrating because they don't tell you how to just resize it!这太令人沮丧了,因为他们没有告诉你如何调整它的大小! They also say "And then at inference pick any value in between" but offer no clue how to pick the value in between!他们还说“然后在推断时选择介于两者之间的任何值”,但不提供如何选择介于两者之间的值!

Here is how I added the flexible shape sizes:以下是我添加灵活形状尺寸的方法:

import coremltools
from coremltools.models.neural_network import flexible_shape_utils
spec = coremltools.utils.load_spec('mymodel_fxedShape.mlmodel')
img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange()
img_size_ranges.add_height_range(640, 2048)
img_size_ranges.add_width_range(640, 2048)
flexible_shape_utils.update_image_size_range(spec, feature_name='inputImage', size_range=img_size_ranges)
flexible_shape_utils.update_image_size_range(spec, feature_name='outputImage', size_range=img_size_ranges)
coremltools.utils.save_spec(spec, 'myModel.mlmodel')

Here is the description of the model:以下是 model 的说明:

description {
  input {
    name: "inputImage"
    shortDescription: "Image to stylize"
    type {
      imageType {
        width: 1536
        height: 1536
        colorSpace: BGR
        imageSizeRange {
          widthRange {
            lowerBound: 640
            upperBound: 2048
          }
          heightRange {
            lowerBound: 640
            upperBound: 2048
          }
        }
      }
    }
  }
  output {
    name: "outputImage"
    shortDescription: "Stylized image"
    type {
      imageType {
        width: 1536
        height: 1536
        colorSpace: BGR
        imageSizeRange {
          widthRange {
            lowerBound: 640
            upperBound: 2048
          }
          heightRange {
            lowerBound: 640
            upperBound: 2048
          }
        }
      }
    }
  }
}

There are two layers using "outputShape":有两层使用“outputShape”:

layers {
    name: "SpatialFullConvolution_63"
    input: "Sequential_53"
    output: "SpatialFullConvolution_63_output"
    convolution {
      outputChannels: 16
      kernelChannels: 32
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      isDeconvolution: true
      hasBias: true
      weights {
      }
      bias {
      }
      outputShape: 770
      outputShape: 770
    }
  }
  ...relu layer...
  layers {
    name: "SpatialFullConvolution_67"
    input: "ReLU_66"
    output: "SpatialFullConvolution_67_output"
    convolution {
      outputChannels: 8
      kernelChannels: 16
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      isDeconvolution: true
      hasBias: true
      weights {
      }
      bias {
      }
      outputShape: 1538
      outputShape: 1538
    }
  }

I am now trying to figure out how to remove the outputShape from those two layers.我现在正试图弄清楚如何从这两层中删除 outputShape。

>>> layer = spec.neuralNetwork.layers[49]
>>> layer.convolution.outputShape
[1538L, 1538L]

I tried setting it to []:我尝试将其设置为 []:

layer.convolution.outputShape = []

To a Shape:到形状:

layer.convolution.outputShape = flexible_shape_utils.Shape(())

Whatever I try, I get the error:无论我尝试什么,我都会收到错误消息:

TypeError: Can't set composite field

Do I have to create a new layer and then link it to the layer that is outputting to it and the layer it is outputting to?我是否必须创建一个新图层,然后将其链接到输出到它的图层和输出到的图层?

The issue in this case was that there were layers present in the model that used a fixed shape for their outputShapes.在这种情况下,问题在于 model 中存在层,其输出形状使用固定形状。 For example:例如:

>>> layer = spec.neuralNetwork.layers[49]
>>> layer.convolution.outputShape
[1538L, 1538L]

The model in question was indeed fully convolutional, so before converting to CoreML, it worked with any input and output shapes.有问题的 model 确实是完全卷积的,因此在转换为 CoreML 之前,它适用于任何输入和 output 形状。

I was able to delete the fixed outputShape with this command:我可以使用以下命令删除固定的 outputShape:

layer = spec.neuralNetwork.layers[49]
del layer.convolution.outputShape[:]

After doing that, the model worked with flexible input and output shapes.之后,model 使用灵活的输入和 output 形状。

All credit for this answer goes to Matthijs Hollemans.这个答案的所有功劳都归功于 Matthijs Hollemans。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM