简体   繁体   English

使用OpenCV,C ++和2D图像进行头部姿势估计-几何方法-滚动,偏航和俯仰

[英]Head Pose Estimation with OpenCV, C++ and Image 2D - Geometric Method - Roll, Yaw and Pitch

I'm trying to find the three angles of the face of a person, based on a 2D image 我正在尝试基于2D图像找到一个人的脸的三个角度 2D影像 .

I'm using OpenCV with HaarCascade to find the face, eyes, nose and mouth. 我将OpenCV与HaarCascade结合使用来查找脸部,眼睛,鼻子和嘴巴。 But I don't found any geometric method that can help me to find the angles X, Y and Z (Roll, Pitch and Yaw). 但是我没有找到任何可以帮助我找到角度X,Y和Z(滚动,俯仰和偏航)的几何方法。

Could someone help me showing some method in c++ or java that works? 有人可以帮我显示一些可用的c ++或java方法吗?

Given a single image and no other information, there is no single solution for the angles. 在给定单个图像且没有其他信息的情况下,就没有角度的唯一解决方案。 Consider the case of just Yaw. 考虑仅偏航的情况。 Projected onto the 2d plane, this is visible as a small change in the projected distance between eyes and the placement of the eyes with respect to the nose/mouth. 投影到2d平面上时,这是可见的,因为眼睛之间的投影距离和眼睛相对于鼻子/嘴的位置之间的微小变化。 This distance is not a constant from person to person, however. 但是,人与人之间的距离不是恒定的。

One typical way around this is to require that the user 'calibrate' their face by looking directly at the camera for the nominal '0' angles. 解决此问题的一种典型方法是要求用户通过直接注视摄像机的标称“ 0”角来“校准”其面部。 At this point, you now have reference lengths against which you can compare subsequent images. 此时,您现在有了参考长度,可以将其与后续图像进行比较。

The lengths are still not quite enough information, however, as the amount that the apparent projected distances change depends on the optics and the distance of the face from the camera. 但是,长度仍然不是足够的信息,因为表观投影距离的变化量取决于光学器件和面部与相机的距离。 The optics you usually configure manually; 您通常手动配置的光学元件; the distance you can estimate by assuming 'average' facial dimensions and assuming the 'nominal' image matches those dimensions perfectly. 您可以通过假设“平均”面部尺寸并假设“标称”图像完美匹配这些尺寸来估算距离。 You can make this adjustable if you find that it's over- or under- estimating the rotations for a particular face. 如果发现它高估或低估了特定面的旋转,则可以使它可调。

Once you have all these assumptions in place, it's fairly simple geometry. 一旦有了所有这些假设,就可以构成相当简单的几何图形。 You can estimate roll from the line from the eyes through the nose to the mouth. 您可以估计从眼睛到鼻子到嘴巴的直线滚动。 You can measure the spacing between the eyes to estimate yaw. 您可以测量眼睛之间的间距以估计偏航。 Finally, you can estimate pitch using the spacing between eyes/mouth or eyes/nose. 最后,您可以使用眼睛/嘴巴或眼睛/鼻子之间的间距估算音高。 Bear in mind, these assumptions work best when the face is still fairly close to nominal. 请记住,当面部仍然非常接近名义时,这些假设最有效。

So, you want to finding the orientation (in RPY-angles) of a face based on the position of the nose, eyes and mouth. 因此,您想根据鼻子,眼睛和嘴巴的位置找到脸部的方向(以RPY角)。 Assuming that all three (four - two eyes) are visible, I would use the symmetric features of the face for determining the head's orientation, such as: 假设所有三只(四只-两只眼睛)都是可见的,我将使用脸部的对称特征来确定头部的方向,例如:

A line between the eyes could be used as a reference for one of the axes (for instance the Pitch). 眼睛之间的一条线可以用作其中一根轴(例如,螺距)的参考。 Then, we may assume that the Roll axis points in the nose's direction - which can be measured through the positional displacement of the nose to the mid-point between the eyes. 然后,我们可以假设横滚轴指向鼻子的方向-可以通过鼻子到双眼之间的中点的位置位移来测量。 And lastly, the Yaw could be measured through the distance relation between the mid-point between the eyes, the position of the nose, and the mouth's position. 最后,可以通过眼睛之间的中点,鼻子的位置和嘴巴的位置之间的距离关系来测量偏航角。

I do not know the distance relations between the four interest points, and they probably are different with regards to gender, age, and origin. 我不知道这四个兴趣点之间的距离关系,它们在性别,年龄和出身方面可能有所不同。 However, if you can find such a relation, the derivation of the angles should mathematically be rather straight forward. 但是,如果可以找到这样的关系,则在数学上,角度的推导应该相当简单。

Interesting application by the way! 有趣的应用方式!

If you use a cascade classifier to detect the right eye, left eye and nose, calculate the centroid of each feature (feature x/2, feature y/2) this will give you three xy points on your image. 如果使用级联分类器检测右眼,左眼和鼻子,计算每个特征(特征x / 2,特征y / 2)的质心,这将为您的图像提供三个xy点。

You can detect roll by looking at the Y values of each eye, if one is higher than the other, it means the head is tilted in the direction of the lowest Y value (as one eye moves up the other moves down) 您可以通过查看每只眼睛的Y值来检测滚动,如果一只眼睛高于另一只眼睛,则表示头部向最低Y值的方向倾斜(一只眼睛向上移动,另一只眼睛向下移动)

You can detect yaw by looking at the X value of the nose, if the user looks to their left, the X value of their nose will be closer to their left eye's X value, and same with looking to the right at the right eyes X value. 您可以通过查看鼻子的X值来检测偏航,如果用户向左看,则鼻子的X值将更靠近其左眼的X值,与向右看右眼的X相同值。

You can detect pitch by looking at the Y value of the nose, if the user is looking up, the Y value will be closer to both eyes Y values and if they look down, the Y value will be further away from the eye value. 您可以通过查看鼻子的Y值来检测音高,如果用户向上看,则Y值将更靠近两只眼睛的Y值;如果用户朝下看,则Y值将离眼睛的值更远。

Now this is of course not tremendously accurate and won't give you exact angles, however you can use this information to try and classify each value within certain groups ie (looking forward, looking left, looking really left) 现在,这当然不是极其准确,不会给你确切的角度,但你可以使用这个信息来尝试和分类某些群体中的每个值即(盼望着,盼望离开了,看着左)

The only thing I can see effecting you calculating all three in one image might be if the roll is fairly drastic calculating the yaw might be troublesome as the X axis is no longer flat. 我能看到的唯一影响您计算一张图像中的所有三个图像的可能是,如果滚动相当剧烈,则由于X轴不再平坦,计算偏航可能会很麻烦。

You can solve this by correcting the image through 2D rotation. 您可以通过2D旋转校正图像来解决此问题。 You will need to find how much the image needs to be rotated with 您将需要查找需要旋转多少图像

Value = (right eye Y / 2) - (left eye Y / 2)

With this information you can correct the image and continue with processing (to rotate the image look up creating a 2D rotation matrix and using warp affine) 利用此信息,您可以校正图像并继续进行处理(旋转图像以查找2D旋转矩阵并使用扭曲仿射)

Sorry if this is a bit of a necro but I found the above method to be pretty successful and I hope it help someone 抱歉,这有点坏处,但我发现上述方法相当成功,希望对您有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM