简体   繁体   English

如何将opencv的人脸检测到的边界框坐标转换为dlib的人脸检测到的边界框坐标?

[英]How to convert opencv's face detected bounding box coordinates to dlib's face detected bounding box coordinates?

I ran a live stream face detection code using opencv's pretrained dnn model & dlib's hog model. 我使用opencv的预训练dnn模型和dlib的hog模型运行了实时流人脸检测代码。 I got detections from several cameras and the code prints out bounding box coordinates for both opencv and dlib. 我从几台摄像机得到了检测结果,代码为opencv和dlib都打印出了边界框坐标。 I was expecting the same results but I have very different results. 我期待相同的结果,但结果却截然不同。 Is there a way to convert the opencv coordinates to dlib's? 有没有办法将opencv坐标转换为dlib?

I've tried to find a mathematical (linear) model to connect them two but it didn't work. 我试图找到一个数学模型(线性模型)来将它们两个连接起来,但是没有用。

import numpy as np
import argparse
import imutils
import pickle
import time
import cv2
import os
import align
import dlib
import time
import datetime

face_detector = dlib.get_frontal_face_detector()
predictor_model = "shape_predictor_68_face_landmarks.dat"
face_aligner = align.AlignDlib(predictor_model)

ap = argparse.ArgumentParser()
ap.add_argument("-d", "--detector", required=True,
    help="path to OpenCV's deep learning face detector")
ap.add_argument("-m", "--embedding-model", required=True,
    help="path to OpenCV's deep learning face embedding model")
ap.add_argument("-r", "--recognizer", required=True,
    help="path to model trained to recognize faces")
ap.add_argument("-l", "--le", required=True,
    help="path to label encoder")
ap.add_argument("-c", "--confidence", type=float, default=0.8,
    help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
    "res10_300x300_ssd_iter_140000.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
print("[INFO] starting video stream...")
vs = cv2.VideoCapture(0)
time.sleep(2.0)

while True:
    ret, frame = vs.read()
    frame = imutils.resize(frame, width=600)
    (h, w) = frame.shape[:2]
    imageBlob = cv2.dnn.blobFromImage(
        cv2.resize(frame, (300, 300)), 1.0, (300, 300),
        (104.0, 177.0, 123.0), swapRB=False, crop=False)
    detector.setInput(imageBlob)
    detections = detector.forward()
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > args["confidence"]:
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            face = frame[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]
            if fW < 20 or fH < 20:
                continue
            rgb = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
            detected_faces_dlib = face_detector(rgb, 1)
            detected_faces = dlib.rectangle(left=startX, top=startY, right=endX, bottom=endY)
            print(detected_faces)
            print(detected_faces_dlib)

Here are the results: 结果如下:

[(333, 191) (490, 414)]
rectangles[[(-22, 47) (150, 202)]]
[(333, 190) (490, 413)]
rectangles[[(-22, 47) (150, 202)]]
[(333, 190) (491, 414)]
rectangles[[(-22, 47) (150, 202)]]
[(334, 191) (491, 416)]
rectangles[[(-22, 47) (150, 202)]]
[(334, 196) (493, 416)]
rectangles[[(-22, 47) (150, 202)]]

I just spent a bunch of time dealing with this and if your goal is to detect facial landmarks on a face that was detected by the dnn detector, your best bet is to retrain the shape_predictor_68_face_landmarks.dat using rectangles from the dnn detector. 我只是花了很多时间来解决这个问题,如果您的目标是要检测dnn检测器检测到的面部的脸部界标,那么最好的选择是使用dnn检测器中的矩形重新训练shape_predictor_68_face_landmarks.dat

Following this article as a guide, I wrote a python script that went through the ibug300 training set, re-detected the face's bounding boxes, rewrote the training set's xml file and then ran the train_shape_predictor script to get a new .dat file. 按照本文的指导,我编写了一个Python脚本,该脚本通过了ibug300训练集,重新检测了面部的边界框,重写了训练集的xml文件,然后运行train_shape_predictor脚本以获取新的.dat文件。

The results were very good compared with trying to reshape the "dnn rect" to approximate the "hog box." 与尝试重塑“ dnn rect”以近似“ hog box”相比,结果非常好。

One tip before you dive into retraining: The dnn face detection returns rectangles and their width and height vary a great deal. 进行再培训之前的一个技巧:dnn人脸检测会返回矩形,并且其宽度和高度会有很大不同。 This doesn't work well for shape predictor training. 这对于形状预测器训练效果不佳。 It's better use a square that's sides are ~1.35 * dnn_rect.width. 最好使用边长为~1.35 * dnn_rect.width. Seems like a magic number, but that's the average ratio of height to width of dnn face detection rects. 看起来像一个魔术数字,但这是dnn人脸检测矩形的高宽平均比。

# take a bounding predicted by opencv and convert it
# to the dlib (top, right, bottom, left) 
def bb_to_rect(bb):
    top=bb[1]
    left=bb[0]
    right=bb[0]+bb[2]
    bottom=bb[1]+bb[3]
    return np.array([top, right, bottom, left]) 


# take a bounding predicted by dlib and convert it
# to the format (x, y, w, h) as we would normally do
# with OpenCV
def rect_to_bb(rect):

    x = rect.left()
    y = rect.top()
    w = rect.right() - x
    h = rect.bottom() - y

    # return a tuple of (x, y, w, h)
    return (x, y, w, h)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM