简体   繁体   English

电影坐标到世界坐标

[英]Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL. 我正在通过使用OpenCV3.1和OpenGL进行功能匹配来构建3D点云。

I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates). 我已经实现了1)相机校准(因此,我具有相机的本征矩阵)2)特征提取(因此,我在像素坐标中具有2D点)。 I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. 我浏览的网站很少,但总体上都提出了将3D对象点转换为像素点的流程,但是我正在做完全反向投影。 Here is the ppt that explains it well. 这是解释得很好的ppt

I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). 我已经从像素坐标(x,y)实现了影片坐标(u,v)(借助本征矩阵)。 Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y). 谁能阐明我如何从胶片坐标(x,y)渲染相机坐标(X,Y,Z)的“ Z”。

Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat. 请指导我如何在OpenCV中实现期望目标的功能,例如solvePnP,recoverPose,findFundamentalMat,findEssentialMat。

You can't, if all you have is 2D images from that single camera location. 如果您拥有的只是来自单个摄像机位置的2D图像,那么您将无法进行。

In theory you could use heuristics to infer a Z stacking. 从理论上讲,您可以使用启发式方法推断Z叠加。 But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. 但是从数学上讲,您的问题尚未定义,实际上有无数种不同的Z坐标可用来评估您的约束。 You have to supply some extra information. 您必须提供一些额外的信息。 For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar). 例如,您可以将摄像机移动几帧(Google“运动的结构”),或者使用多台摄像机,或者使用具有深度传感器并为您提供完整XYZ元组(Kinect或类似)的摄像机。

Update due to comment: 由于评论而更新:

For every pixel in a 2D image there is an infinite number of points that is projected to it. 对于2D图像中的每个像素,都有无数个点投影到该像素上。 The technical term for that is called a ray . 它的技术术语称为ray If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. 如果有两张2D图像的空间体积大致相同,则每个图像的射线组(每个像素一个)与与另一个图像相对应的射线组相交。 Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. 也就是说,如果您确定图像#1中像素的射线,则它将映射到图像#2中该射线所覆盖的像素行。 Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point. 在图像#2中沿该线选择一个特定像素将为您提供该点的XYZ元组。

Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. 由于您要沿着图像之间的特定轴a将对象旋转一定角度θ ,因此实际上您要处理很多图像。 All you have to do is deriving the camera location by an additional transformation ( inverse(translate(-a)·rotate(θ)·translate(a) ). 您所要做的就是通过附加的变换( inverse(translate(-a)·rotate(θ)·translate(a) ))得出摄像机的位置。

Then do the following: Select a image to start with. 然后执行以下操作:选择一个图像作为开始。 For the particular pixel you're interested in determine the ray it corresponds to. 对于您感兴趣的特定像素,确定其对应的射线。 For that simply assume two Z values for the pixel. 为此,只需假设像素有两个Z值。 0 and 1 work just fine. 0和1正常工作。 Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; 将它们转换回对象的空间,然后将其投影到您选择使用的下一台摄像机的视图空间; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). 结果将是图像平面中的两个点(可能超出了实际图像的限制,但这不是问题)。 These two points define a line within that second image. 这两个点在该第二张图像内定义了一条线。 Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. 找到与所选第一张图像上的像素相匹配的那条线的像素,然后像处理第一张图像一样将其投影回空间中。 Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial). 由于数值舍入误差,您将无法在3D空间中获得光线的完美交点,因此请找到光线彼此最接近的点(这涉及求解二次多项式,这是微不足道的)。

To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. 若要选择要在图像之间匹配的像素,可以使用某些功能运动跟踪算法,如视频压缩或类似方法中使用的算法。 The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. 基本思想是,对于每个像素,其周围环境均与先前图像中的相同区域进行关联。 Where the correlation peaks is, where it likely was moved from into. 相关峰在哪里,可能从哪里移到了。

With this pixel tracking in place you can then derive the structure of the object. 有了此像素跟踪,您便可以得出对象的结构。 This is essentially what structure from motion does. 这本质上就是运动的结构。

With single camera and rotating object on fixed rotation platform I would implement something like this: 在固定旋转平台上使用单个摄像机和旋转对象的情况下,我将执行以下操作:

设定

Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. 每个摄像机的分辨率xs,ys视场FOV由两个角度FOVx,FOVy定义因此请检查您的摄像机数据表或对其进行测量。 From that and perpendicular distance ( z ) you can convert any pixel position ( x,y ) to 3D coordinate relative to camera (x',y',z'). 从该距离和垂直距离( z ),您可以将任何像素位置( x,y )转换为相对于相机(x',y',z')的3D坐标。 So first convert pixel position to angles: 因此,首先将像素位置转换为角度:

ax = (x - (xs/2)) * FOVx / xs 
ay = (y - (ys/2)) * FOVy / ys 

and then compute cartesian position in 3D: 然后在3D中计算笛卡尔位置:

x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance

That is nice but on common image we do not know the distance . 很好,但是根据普通图像,我们不知道distance Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. 幸运的是在这样的设置,如果我们把我们的对象比任何凸边将作出最大ax如果交叉的垂直平面相机上的侧角。 So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance . 因此,检查几帧,如果检测到最大ax ,则可以假定其位于distance的对象的边缘(或凸块)。

If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis ( Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry. 如果你也知道的旋转角度ang平台(相对于相机),然后你就可以计算使用未转动位置 旋转绕Y轴(公式Ay中的链接矩阵)和已知相对平台中心位置,摄像头(只减去旋转之前的减数)...正如我所提到的,这只是简单的几何图形。

In an nutshell: 简而言之:

  1. obtain calibration data 获取校准数据

    FOVx,FOVy,xs,ys,distance. FOVx,FOVy,xs,ys,距离。 Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as 某些相机数据表仅包含FOVx,但如果像素为矩形,则可以根据以下分辨率计算FOVy:

     FOVx/FOVy = xs/ys 

    Beware with Multi resolution camera modes the FOV can be different for each resolution !!! 当心使用多分辨率相机模式,每种分辨率的FOV可能不同!

  2. extract the silhouette of your object in the video for each frame 为每一帧提取视频中对象的轮廓

    you can subbstract the background image to ease up the detection 您可以减去背景图像以简化检测

  3. obtain platform angle for each frame 获取每个框架的平台角度

    so either use IRC data or place known markers on the rotation disc and detect/interpolate... 因此,可以使用IRC数据或将已知标记放置在转盘上,然后检测/内插...

  4. detect ax maximum 最大检测ax

    just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. 只需检查轮廓的x坐标(分别针对图像的每个y线),如果检测到峰,则将其3D位置添加到模型中。 Let assume rotating rectangular box. 假设旋转矩形框。 Some of its frames could look like this: 它的某些框架可能如下所示:

    最大斧头

    So inspect one horizontal line on all frames and found the maximal ax . 因此,检查所有帧上的一条水平线并找到最大ax To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". 为了提高精度,您可以通过旋转平台直至“精确”发现峰值来进行闭环调节。 Do this for all horizontal lines separately. 分别对所有水平线执行此操作。

    btw. 顺便说一句 if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum. 如果在几帧中没有检测到ax变化,这意味着半径相同的圆形...,那么您可以将每个此类ax视为最大ax

Easy as pie resulting in 3D point cloud. 易如反掌,形成3D点云。 Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ... 您可以按平台角度对其进行排序,以简化向网格的转换...该角度也可以用作纹理坐标...

But do not forget that you will lose some concave details that are hidden in the silhouette !!! 但是不要忘记,您会丢失一些隐藏在轮廓中的凹入细节!

If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction . 如果这种方法还不够,您可以使用相同的设置进行立体3D重建 Because each rotation behaves as new (known) camera position. 因为每次旋转都表现为新的(已知)相机位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM