简体   繁体   中英

How to convert opencv's face detected bounding box coordinates to dlib's face detected bounding box coordinates?

I ran a live stream face detection code using opencv's pretrained dnn model & dlib's hog model. I got detections from several cameras and the code prints out bounding box coordinates for both opencv and dlib. I was expecting the same results but I have very different results. Is there a way to convert the opencv coordinates to dlib's?

I've tried to find a mathematical (linear) model to connect them two but it didn't work.

import numpy as np
import argparse
import imutils
import pickle
import time
import cv2
import os
import align
import dlib
import time
import datetime

face_detector = dlib.get_frontal_face_detector()
predictor_model = "shape_predictor_68_face_landmarks.dat"
face_aligner = align.AlignDlib(predictor_model)

ap = argparse.ArgumentParser()
ap.add_argument("-d", "--detector", required=True,
    help="path to OpenCV's deep learning face detector")
ap.add_argument("-m", "--embedding-model", required=True,
    help="path to OpenCV's deep learning face embedding model")
ap.add_argument("-r", "--recognizer", required=True,
    help="path to model trained to recognize faces")
ap.add_argument("-l", "--le", required=True,
    help="path to label encoder")
ap.add_argument("-c", "--confidence", type=float, default=0.8,
    help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
    "res10_300x300_ssd_iter_140000.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
print("[INFO] starting video stream...")
vs = cv2.VideoCapture(0)
time.sleep(2.0)

while True:
    ret, frame = vs.read()
    frame = imutils.resize(frame, width=600)
    (h, w) = frame.shape[:2]
    imageBlob = cv2.dnn.blobFromImage(
        cv2.resize(frame, (300, 300)), 1.0, (300, 300),
        (104.0, 177.0, 123.0), swapRB=False, crop=False)
    detector.setInput(imageBlob)
    detections = detector.forward()
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > args["confidence"]:
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            face = frame[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]
            if fW < 20 or fH < 20:
                continue
            rgb = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
            detected_faces_dlib = face_detector(rgb, 1)
            detected_faces = dlib.rectangle(left=startX, top=startY, right=endX, bottom=endY)
            print(detected_faces)
            print(detected_faces_dlib)

Here are the results:

[(333, 191) (490, 414)]
rectangles[[(-22, 47) (150, 202)]]
[(333, 190) (490, 413)]
rectangles[[(-22, 47) (150, 202)]]
[(333, 190) (491, 414)]
rectangles[[(-22, 47) (150, 202)]]
[(334, 191) (491, 416)]
rectangles[[(-22, 47) (150, 202)]]
[(334, 196) (493, 416)]
rectangles[[(-22, 47) (150, 202)]]

I just spent a bunch of time dealing with this and if your goal is to detect facial landmarks on a face that was detected by the dnn detector, your best bet is to retrain the shape_predictor_68_face_landmarks.dat using rectangles from the dnn detector.

Following this article as a guide, I wrote a python script that went through the ibug300 training set, re-detected the face's bounding boxes, rewrote the training set's xml file and then ran the train_shape_predictor script to get a new .dat file.

The results were very good compared with trying to reshape the "dnn rect" to approximate the "hog box."

One tip before you dive into retraining: The dnn face detection returns rectangles and their width and height vary a great deal. This doesn't work well for shape predictor training. It's better use a square that's sides are ~1.35 * dnn_rect.width. Seems like a magic number, but that's the average ratio of height to width of dnn face detection rects.

# take a bounding predicted by opencv and convert it
# to the dlib (top, right, bottom, left) 
def bb_to_rect(bb):
    top=bb[1]
    left=bb[0]
    right=bb[0]+bb[2]
    bottom=bb[1]+bb[3]
    return np.array([top, right, bottom, left]) 


# take a bounding predicted by dlib and convert it
# to the format (x, y, w, h) as we would normally do
# with OpenCV
def rect_to_bb(rect):

    x = rect.left()
    y = rect.top()
    w = rect.right() - x
    h = rect.bottom() - y

    # return a tuple of (x, y, w, h)
    return (x, y, w, h)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM