Wrong offsets when displaying multiple VNRecognizedObjectObservation boundingBoxes using SwiftUI

Question

I am using Vision to detect objects and after getting [VNRecognizedObjectObservation] I transform the normalized rects before showing them:

let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(height))
VNImageRectForNormalizedRect(normalizedRect, width, height) // Displayed with SwiftUI, that's why I'm applying transform
    .applying(transform)

The width and height are from SwiftUI GeometryReader:

Image(...)
    .resizable()
    .scaledToFit()
    .overlay {
        GeometryReader { geometry in // ZStack and ForEach([VNRecognizedObjectObservation], id: \.uuid), then:
            let calculatedRect = calculateRect(boundingBox, geometry)
            Rectangle()
                .frame(width: calculatedRect.width, height: calculatedRect.height)
                .offset(x: calculatedRect.origin.x, y: calculatedRect.origin.y)
        }
    }

But the problem is many boxes are positioned incorrectly (while some are accurate) even on on square images.

This is not related to model because the same images (using same MLModel) have pretty accurate BBs when I try them in Xcode Model Preview section.

Sample Image in my App:

我的应用程序中的示例图片

Sample Image in Xcode Preview:

Xcode 预览中的示例图像

Update (Minimal Reproducible Example):

Having this code inside ContentView.swift as a macOS SwiftUI project while having YOLOv3Tiny.mlmodel in project bundle will produce the same results.

import SwiftUI
import Vision
import CoreML

class Detection: ObservableObject {
    let imgURL = URL(string: "https://i.imgur.com/EqsxxTc.jpg")! // Xcode preview generates this: https://i.imgur.com/6IPNQ8b.png
    @Published var objects: [VNRecognizedObjectObservation] = []

    func getModel() -> VNCoreMLModel? {
        if let modelURL = Bundle.main.url(forResource: "YOLOv3Tiny", withExtension: "mlmodelc") {
            if let mlModel = try? MLModel(contentsOf: modelURL, configuration: MLModelConfiguration()) {
                return try? VNCoreMLModel(for: mlModel)
            }
        }
        return nil
    }

    func detect() async {
        guard let model = getModel(), let tiff = NSImage(contentsOf: imgURL)?.tiffRepresentation else {
            fatalError("Either YOLOv3Tiny.mlmodel is not in project bundle, or image failed to load.")
            // YOLOv3Tiny: https://ml-assets.apple.com/coreml/models/Image/ObjectDetection/YOLOv3Tiny/YOLOv3Tiny.mlmodel
        }
        let request = VNCoreMLRequest(model: model) { (request, error) in
            DispatchQueue.main.async {
                self.objects = (request.results as? [VNRecognizedObjectObservation]) ?? []
            }
        }
        try? VNImageRequestHandler(data: tiff).perform([request])
    }

    func deNormalize(_ rect: CGRect, _ geometry: GeometryProxy) -> CGRect {
        let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(geometry.size.height))
        return VNImageRectForNormalizedRect(rect, Int(geometry.size.width), Int(geometry.size.height)).applying(transform)
    }
}

struct ContentView: View {
    @StateObject var detection = Detection()

    var body: some View {
        AsyncImage(url: detection.imgURL) { img in
            img.resizable().scaledToFit().overlay {
                GeometryReader { geometry in
                    ZStack {
                        ForEach(detection.objects, id: \.uuid) { object in
                            let rect = detection.deNormalize(object.boundingBox, geometry)
                            Rectangle()
                                .stroke(lineWidth: 2)
                                .foregroundColor(.red)
                                .frame(width: rect.width, height: rect.height)
                                .offset(x: rect.origin.x, y: rect.origin.y)
                        }
                    }
                }
            }
        } placeholder: {
            ProgressView()
        }
        .onAppear {
            Task { await self.detection.detect() }
        }
    }
}

Edit: further testing revealed that VN returns correct positions, and my deNormalize() function also return correct positions and size so it has to be related to SwiftUI.

Answer 1

Issue 1

GeometryReader makes everything inside shrink to its smallest size.

Add .border(Color.orange) to the ZStack and you will see something like what I have below.

You can use .frame(maxWidth: .infinity, maxHeight: .infinity) to make the ZStack stretch to take all the available space.

Issue 2

position vs offset .

offset usually starts at the center then you offset by the specified amount.

position is more like origin .

Positions the center of this view at the specified coordinates in its parent's coordinate space.

Issue 3

Adjusting for that center positioning vs top left (0, 0) that is used by origin.

Issue 4

The ZStack needs to be flipped on the X axis.

Below is the full code

import SwiftUI
import Vision
import CoreML
@MainActor
class Detection: ObservableObject {
    //Moved file to assets
    //let imgURL = URL(string: "https://i.imgur.com/EqsxxTc.jpg")! // Xcode preview generates this: https://i.imgur.com/6IPNQ8b.png
    let imageName: String = "EqsxxTc"
    @Published var objects: [VNRecognizedObjectObservation] = []
    
    func getModel() throws -> VNCoreMLModel {
        //Used model directly instead of loading from URL
        let model = try YOLOv3Tiny(configuration: .init()).model
        
        let mlModel = try VNCoreMLModel(for: model)
        
        return mlModel
    }
    
    func detect() async throws {
        let model = try getModel()
        
        guard let tiff = NSImage(named: imageName)?.tiffRepresentation else {
            // YOLOv3Tiny: https://ml-assets.apple.com/coreml/models/Image/ObjectDetection/YOLOv3Tiny/YOLOv3Tiny.mlmodel
            //fatalError("Either YOLOv3Tiny.mlmodel is not in project bundle, or image failed to load.")
            throw AppError.unableToLoadImage
        }
        //Completion handlers are not compatible with async/await you have to convert to a continuation.
        self.objects = try await withCheckedThrowingContinuation { (cont: CheckedContinuation<[VNRecognizedObjectObservation], Error>) in
            
            let request = VNCoreMLRequest(model: model) { (request, error) in
                if let error = error{
                    cont.resume(throwing: error)
                }else{
                    cont.resume(returning: (request.results as? [VNRecognizedObjectObservation]) ?? [])
                }
            }
            do{
                try VNImageRequestHandler(data: tiff).perform([request])
            }catch{
                cont.resume(throwing: error)
            }
        }
    }
    
    func deNormalize(_ rect: CGRect, _ geometry: GeometryProxy) -> CGRect {
        return VNImageRectForNormalizedRect(rect, Int(geometry.size.width), Int(geometry.size.height))
    }
}

struct ContentView: View {
    @StateObject var detection = Detection()
    
    var body: some View {
        Image(detection.imageName)
            .resizable()
            .scaledToFit()
            .overlay {
                GeometryReader { geometry in
                    ZStack {
                        ForEach(detection.objects, id: \.uuid) { object in
                            let rect = detection.deNormalize(object.boundingBox, geometry)
                            Rectangle()
                                .stroke(lineWidth: 2)
                                .foregroundColor(.red)
                                .frame(width: rect.width, height: rect.height)
                            //Changed to position
                            //Adjusting for center vs leading origin
                                .position(x: rect.origin.x + rect.width/2, y: rect.origin.y + rect.height/2)
                        }
                    }
                    //Geometry reader makes the view shrink to its smallest size
                    .frame(maxWidth: .infinity, maxHeight: .infinity)
                    //Flip upside down
                    .rotation3DEffect(.degrees(180), axis: (x: 1, y: 0, z: 0))
                    
                }.border(Color.orange)
            }
        
            .task {
                do{
                    try await self.detection.detect()
                }catch{
                    //Always throw errors to the View so you can tell the user somehow. You don't want crashes or to leave the user waiting for something that has failed.
                    print(error)
                }
            }
    }
}
struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
    }
}

enum AppError: LocalizedError{
    case cannotFindFile
    case unableToLoadImage
}

I also changed some other things as you can notice, there are comments in the code.

Answer 2

Okay so after a long time of troubleshooting, I finally managed to make it work correctly ( while still not understanding the reason for the problem )...

The problem was this part:

GeometryReader { geometry in
    ZStack {
        ForEach(detection.objects, id: \.uuid) { object in
            let rect = detection.deNormalize(object.boundingBox, geometry)
            Rectangle()
                .stroke(lineWidth: 2)
                .foregroundColor(.red)
                .frame(width: rect.width, height: rect.height)
                .offset(x: rect.origin.x, y: rect.origin.y)
        }
    }
}

I assumed because many Rectangle() s will overlap, I need a ZStack() to put them over each other, this turned out to be wrong, apparently when using .offset() they can overlap without any issue, so removing the ZStack() completely solved the problem:

GeometryReader { geometry in
    ForEach(detection.objects, id: \.uuid) { object in
        let rect = detection.deNormalize(object.boundingBox, geometry)
        Rectangle()
            .stroke(lineWidth: 2)
            .foregroundColor(.red)
            .frame(width: rect.width, height: rect.height)
            .offset(x: rect.origin.x, y: rect.origin.y)
    }
}

What I still don't understand, is why moving the ZStack() outside GeometryReader() also solves the problem and why some boxes were in the correct positions while some were not!

ZStack {
    GeometryReader { geometry in
        ForEach(detection.objects, id: \.uuid) { object in
            let rect = detection.deNormalize(object.boundingBox, geometry)
            Rectangle()
                .stroke(lineWidth: 2)
                .foregroundColor(.red)
                .frame(width: rect.width, height: rect.height)
                .offset(x: rect.origin.x, y: rect.origin.y)
        }
    }
}

Wrong offsets when displaying multiple VNRecognizedObjectObservation boundingBoxes using SwiftUI

Question

Update (Minimal Reproducible Example):

2 answers

solution1
1 ACCPTED 2023-02-01 22:37:23

solution2
0 2023-02-01 21:20:14

Wrong offsets when displaying multiple VNRecognizedObjectObservation boundingBoxes using SwiftUI

Question

Update (Minimal Reproducible Example):

2 answers

solution1 1 ACCPTED 2023-02-01 22:37:23

solution2 0 2023-02-01 21:20:14

solution1
1 ACCPTED 2023-02-01 22:37:23

solution2
0 2023-02-01 21:20:14