简体   繁体   English

Kinect for Windows v2深度到彩色图像不对齐

[英]Kinect for Windows v2 depth to color image misalignment

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). 目前我正在为Kinect for Windows v2开发一个工具(类似于XBOX ONE中的那个)。 I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. 我尝试了一些示例,并有一个工作示例,显示相机图像,深度图像,以及使用opencv将深度映射到rgb的图像。 But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part. 但是我看到它在进行映射时重复了我的手,我认为这是由于坐标映射器部分出了问题。

here is an example of it: 这是一个例子: 错误

And here is the code snippet that creates the image (rgbd image in the example) 这是创建图像的代码片段(示例中的rgbd图像)

void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
    HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
    rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
    double minVal, maxVal;
    cv::minMaxLoc(depth_im, &minVal, &maxVal);
    for (int i=0; i < cDepthHeight; i++){
        for (int j=0; j < cDepthWidth; j++){
            if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
                double a = i * cDepthWidth + j;
                ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
                int colorX = (int)(floor(colorPoint.X + 0.5));
                int colorY = (int)(floor(colorPoint.Y + 0.5));
                if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
                {
                    rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
                }
            }

        }
    }
}

Does anyone have a clue of how to solve this? 有没有人知道如何解决这个问题? How to prevent this duplication? 如何防止这种重复?

Thanks in advance 提前致谢

UPDATE: 更新:

If I do a simple depth image thresholding I obtain the following image: 如果我做一个简单的深度图像阈值处理,我会得到以下图像: 阈值

This is what more or less I expected to happen, and not having a duplicate hand in the background. 这或多或少是我预期会发生的事情,并且在后台没有重复的手。 Is there a way to prevent this duplicate hand in the background? 有没有办法在后台防止这个重复的手?

I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. 我建议你使用BodyIndexFrame来识别特定值是否属于玩家。 This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. 这样,您可以拒绝任何不属于播放器的RGB像素,并保留其余的像素。 I do not think that CoordinateMapper is lying. 我不认为CoordinateMapper在撒谎。

A few notes: 几点说明:

  • Include the BodyIndexFrame source to your frame reader 将BodyIndexFrame源包含在帧阅读器中
  • Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; 使用MapColorFrameToDepthSpace代替MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground 这样,您将获得前景的高清图像
  • Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY 找到相应的DepthSpacePoint和depthX,depthY,而不是ColorSpacePoint和colorX,colorY

Here is my approach when a frame arrives (it's in C#): 这是框架到达时的方法(它在C#中):

depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);

_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);

Array.Clear(_displayPixels, 0, _displayPixels.Length);

for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
    DepthSpacePoint depthPoint = _depthPoints[colorIndex];

    if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
    {
        int depthX = (int)(depthPoint.X + 0.5f);
        int depthY = (int)(depthPoint.Y + 0.5f);

        if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
        {
            int depthIndex = (depthY * _depthWidth) + depthX;
            byte player = _bodyData[depthIndex];

            // Identify whether the point belongs to a player
            if (player != 0xff)
            {
                int sourceIndex = colorIndex * BYTES_PER_PIXEL;

                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // B
                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // G
                _displayPixels[sourceIndex] = _colorData[sourceIndex++];    // R
                _displayPixels[sourceIndex] = 0xff;                         // A
            }
        }
    }
}

Here is the initialization of the arrays: 这是数组的初始化:

BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;

_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];

Notice that the _depthPoints array has a 1920x1080 size. 请注意,_depthPoints数组的大小为1920x1080。

Once again, the most important thing is to use the BodyIndexFrame source. 再一次,最重要的是使用BodyIndexFrame源。

Finally I get some time to write the long awaited answer. 最后,我有时间写下期待已久的答案。

Lets start with some theory to understand what is really happening and then a possible answer. 让我们从一些理论开始,以了解真正发生的事情然后是一个可能的答案。

We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. 我们应该首先了解从具有深度相机作为坐标系原点的3D点云到RGB相机的图像平面中的图像的方式。 To do that it is enough to use the camera pinhole model: 要做到这一点,使用相机针孔模型就足够了:

在此输入图像描述

In here, u and v are the coordinates in the image plane of the RGB camera. 在这里, uv是RGB相机的图像平面中的坐标。 the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. 等式右边的第一个矩阵是相机矩阵,RGB相机的AKA内在函数。 The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. 下面的矩阵是外部的旋转和平移,或者更好地说,需要从深度相机坐标系到RGB相机坐标系的转换。 The last part is the 3D point. 最后一部分是3D点。

Basically, something like this, is what the Kinect SDK does. 基本上,这样的东西就是Kinect SDK的功能。 So, what could go wrong that makes the hand gets duplicated? 那么,什么可能会导致手被重复? well, actually more than one point projects to the same pixel.... 好吧,实际上不止一个点投射到同一个像素....

To put it in other words and in the context of the problem in the question. 换句话说,在问题的背景下。

The depth image, is a representation of an ordered point cloud, and I am querying the uv values of each of its pixels that in reality can be easily converted to 3D points. 深度图像是有序点云的表示,我查询其每个像素的uv值,实际上可以很容易地转换为3D点。 The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily. SDK为您提供投影,但它可以指向相同的像素(通常,两个相邻点之间z轴上的距离越多,就越容易出现此问题。

Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering .... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk). 现在,最重要的问题是,你怎么能避免这种情况....好吧,我不确定使用Kinect SDK,因为你不知道应用extrinsics之后的点的Z值,所以不可能使用像Z缓冲这样的技术....但是,您可以假设Z值非常相似并使用原始pointcloud中的值(风险自负)。

If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. 如果您是手动操作而不是使用SDK,则可以将Extrinsics应用于点,并将它们投影到图像平面中,在另一个矩阵中标记哪个点映射到哪个像素以及是否存在一个像素已映射的点,检查z值并进行比较,并始终将最近的点留给相机。 Then, you will have a valid mapping without any problems. 然后,您将获得有效的映射,没有任何问题。 This way is kind of a naive way, probably you can get better ones, since the problem is now clear :) 这种方式是一种天真的方式,可能你可以得到更好的方式,因为问题现在很明显:)

I hope it is clear enough. 我希望它足够清楚。

PS: I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. PS:我目前没有Kinect 2所以我不能试着看看是否有关于这个问题的更新,或者它是否仍然发生同样的事情。 I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :) 我使用了SDK的第一个发布版本(不是预发行版)......所以,可能发生了很多变化...如果有人知道这是否解决只是留下评论:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM