简体   繁体   中英

OpenCV - 3D real world coordinates from two perpendicular 2D images

There probably are answers, but I simply did not understand what I found. Maybe it's the language barrier. So I've decided to finally ask. What I need is to find 3D coordinates from two videos recorded by two cameras. The setup is like this:

在此处输入图片说明

I can't seem to grasp how to do this. What I have is

  • Pixel coordinates on both pictures (relative to 0,0 point on the picture)
  • Focal lengths
  • distance of both cameras from the 0,0,0 real world point (Ax and By)
  • size of the pixel
  • I know the angle between cameras is 90 degrees

What now? OpenCV docs contain this formula: 在此处输入图片说明

I don't know what 's' is, nor the [R|T] matrix, the extrinsic parameters. I don't know where the principal point is and how to find it (cx, cy) and I can only assume setting it to 0 won't be catastrophic. Also, this looks like it's using only one of the 2D images, not both.

I know of calibrateCamera , solvePnP . and stereoCalibrate functions, but I don't know how to use them.

I know just how complex it gets when you have cameras as two "eyes", I hoped it'd be easier in a situation when the cameras are shooting perpendicular images. I now have a formula to calculate the 3D coordinates, but it's not exactly precise. The error is under 1 inch, but 1 inch too much.

xa, ya, xb, yb - pixel coordinates from pictures
focalAB - focal length
W = -(Ax*xb*pixelSize - focalB*By)/(xa*pixelSize*xb*pixelSize - focalA*focalB)
X = Ax + W*xa*pixelSize
Y = W*focalA
Z = W*xa*pixelSize

Errors:

在此处输入图片说明

Those are for focal lengths and pixel size provided by the manafacturer. 5400um and 1,75um. However, the errors are the smallest for the values 4620um and 1,69um, where the biggest one is for 3# X axis, 2,3cm, height errors amost disappear (0,2cm max), and the rest are either 0,1cm or 1-1,5cm.

Beyond telling you to read about stereo vision as @YangKui suggested, I can answer some of your sub-questions.

The equation you quote is the (single camera) 3D to 2D projection equation. This is a projective geometry equation (hence the 1s as the last coordinates) and everything is up to some scale s .

  • s is this scale factor.
  • R is the 3x3 Rotation of the camera relative to the world/chosen coordinate system.
  • t is the translation of the camera origin from the world/chosen coordinate system origin.
  • cx and cy are the principle points in the image - the point of the image plane in pixel units that the Z axis intersects. It is often assumed to be as the center of the image.

One approach, which I find provides intuition if not a high-performance implementation, is to construct the camera matrix for both cameras and then use nonlinear optimization to solve for M minimizing "reprojection error".

So come up with the camera matrices: A's camera matrix will map A's camera center in world coordinates to (0, 0, 0) in A's camera coordinates. The rotation part of A's camera matrix will map (0, 1, 0) in world coordinates to (0, 0, 1) in camera coordinates.

Now you can map world coordinates to A and B image coordinates, so for any (x, y, z) you have a corresponding 4-vector: (x_A, y_A, x_B, y_B). If you throw in the point (A_x, B_y, 0), you get out a 4-vector. The difference between that 4-vector and the measured position is your reprojection error. Throw that at a solver and it should quickly converge on an answer.

You might try ''Multiple View Geometry in Computer Vision'' by Hartley and Zisserman.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM