Incorrect frame of boundingBox with VNRecognizedObjectObservation

Question

I'm having an issue with displaying bounding box around recognized object using Core ML & Vision.

The horizontal detection seems to be working correctly, however, vertically the box is too tall, goes over the top edge of the video, doesn't go all the way to the bottom of the video, and it doesn't follow motion of the camera correctly. Here you can see the issue: https://imgur.com/Sppww8T

This is how video data output is initialized:

let videoDataOutput = AVCaptureVideoDataOutput()
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
videoDataOutput.setSampleBufferDelegate(self, queue: dataOutputQueue!)
self.videoDataOutput = videoDataOutput
session.addOutput(videoDataOutput)
let c = videoDataOutput.connection(with: .video)
c?.videoOrientation = .portrait

I've also tried other video orientations, without much success.

Performing the vision request:

let handler = VNImageRequestHandler(cvPixelBuffer: image, options: [:])
try? handler.perform(vnRequests)

And finally once the request is processed. viewRect is set to the size of the video view: 812x375 (I know, video layer itself is a bit shorter, but that's not the issue here):

let observationRect = VNImageRectForNormalizedRect(observation.boundingBox, Int(viewRect.width), Int(viewRect.height))

I've also tried doing something like (with more issues):

var observationRect = observation.boundingBox
observationRect.origin.y = 1.0 - observationRect.origin.y
observationRect = videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: observationRect)

I've tried to cut out as much of what I deemed to be irrelevant code as possible.

I've actually come across a similar issue using Apple's sample code, when the bounding box wouldn't vertically go around objects as expected: https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture Maybe that means that there is some issue with the API?

Answer 1

I use something like this:

let width = view.bounds.width
let height = width * 16 / 9
let offsetY = (view.bounds.height - height) / 2
let scale = CGAffineTransform.identity.scaledBy(x: width, y: height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -height - offsetY)
let rect = prediction.boundingBox.applying(scale).applying(transform)

This assumes portrait orientation and a 16:9 aspect ratio. It assumes the .imageCropAndScaleOption = .scaleFill .

Credits: The transform code was taken from this repo: https://github.com/Willjay90/AppleFaceDetection

Incorrect frame of boundingBox with VNRecognizedObjectObservation

Question

1 answers

solution1
3 ACCPTED 2019-03-13 10:20:05

Incorrect frame of boundingBox with VNRecognizedObjectObservation

Question

1 answers

solution1 3 ACCPTED 2019-03-13 10:20:05

solution1
3 ACCPTED 2019-03-13 10:20:05