简体   繁体   中英

Incorrect frame of boundingBox with VNRecognizedObjectObservation

I'm having an issue with displaying bounding box around recognized object using Core ML & Vision.

The horizontal detection seems to be working correctly, however, vertically the box is too tall, goes over the top edge of the video, doesn't go all the way to the bottom of the video, and it doesn't follow motion of the camera correctly. Here you can see the issue: https://imgur.com/Sppww8T

This is how video data output is initialized:

let videoDataOutput = AVCaptureVideoDataOutput()
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
videoDataOutput.setSampleBufferDelegate(self, queue: dataOutputQueue!)
self.videoDataOutput = videoDataOutput
session.addOutput(videoDataOutput)
let c = videoDataOutput.connection(with: .video)
c?.videoOrientation = .portrait

I've also tried other video orientations, without much success.

Performing the vision request:

let handler = VNImageRequestHandler(cvPixelBuffer: image, options: [:])
try? handler.perform(vnRequests)

And finally once the request is processed. viewRect is set to the size of the video view: 812x375 (I know, video layer itself is a bit shorter, but that's not the issue here):

let observationRect = VNImageRectForNormalizedRect(observation.boundingBox, Int(viewRect.width), Int(viewRect.height))

I've also tried doing something like (with more issues):

var observationRect = observation.boundingBox
observationRect.origin.y = 1.0 - observationRect.origin.y
observationRect = videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: observationRect)

I've tried to cut out as much of what I deemed to be irrelevant code as possible.

I've actually come across a similar issue using Apple's sample code, when the bounding box wouldn't vertically go around objects as expected: https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture Maybe that means that there is some issue with the API?

I use something like this:

let width = view.bounds.width
let height = width * 16 / 9
let offsetY = (view.bounds.height - height) / 2
let scale = CGAffineTransform.identity.scaledBy(x: width, y: height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -height - offsetY)
let rect = prediction.boundingBox.applying(scale).applying(transform)

This assumes portrait orientation and a 16:9 aspect ratio. It assumes the .imageCropAndScaleOption = .scaleFill .

Credits: The transform code was taken from this repo: https://github.com/Willjay90/AppleFaceDetection

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM