简体繁体 English

使用 2d bbox 和 plate bbox 预测汽车的表面

[英]Predicting the surface of the car using its 2d bbox and plate bbox

原文 2020-03-17 13:36:59 8 1 python/ image/ tensorflow/ pytorch/ object-detection

I'm trying to solve an interesting problem w/o using GPU intensive model in inference time.我正在尝试在推理时间内使用 GPU 密集型模型解决一个有趣的问题。 (No Deep Learning) （没有深度学习）

Input: 2D Image which contains car(s) in it, with accurate bboxes, and also a bbox of the plate's car.输入：包含汽车的 2D 图像，带有准确的 bbox，还有一个车牌的 bbox。 (We also know that the cameras are located just a bit above the cars) （我们也知道摄像头位于汽车上方一点）

Output: Surface of the car prediction (the bottom side of a cuboid in 3d bbox)输出：汽车预测的表面（3d bbox 中长方体的底部）

Approach 1: I'm trying to leverage the fact that I have some prior knowledge except the 2d bbox of the car, but also the 2d bbox of the plate, which can give me the orientation of the car, I thought about taking an angle between the center bbox of the car and the center bbox of the 2d plate to understand what is the direction the car is facing at.方法1：我试图利用我有一些先验知识的事实，除了汽车的2d bbox，还有车牌的2d bbox，它可以给我汽车的方向，我想过取一个角度在汽车的中心 bbox 和 2d 板的中心 bbox 之间，以了解汽车面向的方向。

After I know the direction the car is facing to, I also can roughly know where should be one of the edges of the surface because of the fact that the 3d bbox is bounded by the 2d bbox (thus the surface is also bounded), and the fact that the 2d bbox of the plate is a few pixels far from the surface, so one of the edges of the surface can be estimated.在我知道汽车面向的方向后，我也可以大致知道哪里应该是表面的边缘之一，因为 3d bbox 受 2d bbox 的限制（因此表面也有界），并且板的 2d bbox 距离表面几个像素的事实，因此可以估计表面的边缘之一。

But, the problem here is determining the lateral edges, how 'long' should they be.但是，这里的问题是确定横向边缘，它们应该有多“长”。 I'm not quite sure how to estimate the lateral sides of the bottom surface, but I think it can be somehow inferred by the size of the 2d bbox of the car (which again, should bound that surface).我不太确定如何估计底面的侧面，但我认为它可以通过汽车的 2d bbox 的大小来推断（同样，应该限制该表面）。 Maybe I'll be able to solve it after finding the edge of the surface, and then exploring ways to infer the lateral edges of that surface.也许我可以在找到曲面的边缘后解决它，然后探索推断该曲面侧边的方法。

Approach 2: Annotating the data with 3d bboxes with a pre-trained model, and trying to predict the 3d bbox from a 2d bbox (and probably some more priors like 2d bbox of the plate), but I'm not using a deep model to do it, but a simple NN with a few layers to predict the 3d bbox.方法 2：使用预训练模型用 3d bbox 注释数据，并尝试从 2d bbox 预测 3d bbox（可能还有更多的先验，如板的 2d bbox），但我没有使用深度模型要做到这一点，但一个简单的神经网络有几层来预测 3d bbox。 (trained in a supervised manner) （以监督方式训练）

1 个解决方案

Using Deep learning-based object detection methods is tend to achieve a really high detection accuracy.使用基于深度学习的对象检测方法往往会达到非常高的检测精度。 Deep neural network is a trend to improve the accuracy of bounding box, designing a reasonable regression loss function is also an important way.深度神经网络是提高bounding box准确率的趋势，设计合理的回归损失函数也是一个重要途径。 So, if you are considering accuracy as an important factor on the project you may need to consider using deep learning.因此，如果您将准确性视为项目的重要因素，则可能需要考虑使用深度学习。

But if the accuracy doesn't matter that much and you really prefer not to use deep learning then you can use other simple ways.但是，如果准确性并不那么重要，并且您真的不想使用深度学习，那么您可以使用其他简单的方法。

The conventional 2D object detection yields 4 degrees of freedom (DoF) axis-aligned bounding boxes with center (x, y) and 2D size (w, h), the 3D bounding boxes in autonomous driving context generally have 7 DoF: 3D physical size (w, h, l), 3D center location (x, y, z) and yaw.传统的 2D 物体检测产生 4 个自由度 (DoF) 轴对齐的边界框，具有中心 (x, y) 和 2D 尺寸 (w, h)，自动驾驶环境中的 3D 边界框通常有 7 个自由度：3D 物理尺寸(w, h, l)、3D 中心位置 (x, y, z) 和偏航。 Note that roll and pitch are normally assumed to be zero.请注意，滚动和俯仰通常假定为零。 Now the question is, how do we recover a 7-DoF object from a 4-DoF one?现在的问题是，我们如何从 4-DoF 对象中恢复 7-DoF 对象？ You can find a solution and approach explanation based on this research , but it is a little bit complex since it came from a research.您可以根据这项研究找到解决方案和方法解释，但由于它来自研究，因此有点复杂。

In your 2nd Approach:在您的第二种方法中：

" Annotating the data with 3d bboxes with a pre-trained model " “用 3d bboxes 和预先训练的模型注释数据”

You can try that, then putting all the work for the 3D bbox creation during inference.您可以尝试一下，然后在推理期间将所有工作用于 3D bbox 创建。 This is too specific and very complex problem to answer directly, even more without deep learning.这是一个过于具体和非常复杂的问题，无法直接回答，更不用说深度学习了。 But I hope my answer can help a bit.但我希望我的回答能有所帮助。

Here is another approach I can share just in case you want to consider:这是我可以分享的另一种方法，以防万一你想考虑：

You can also train your own model that has different classes for each direction of the car.您还可以训练自己的模型，该模型针对汽车的每个方向具有不同的类。 It actually may take you a lot of time to prepare the dataset for it.实际上，为它准备数据集可能需要花费大量时间。 Using that model, you can easily detect car direction.使用该模型，您可以轻松检测汽车方向。 By that you may able to let a specific function to create a 3D bbox based on that car-direction detected.通过这种方式，您可以让特定功能根据检测到的汽车方向创建 3D bbox。 Although I cannot recommend this approach if you do not prefer making your own annotated dataset since it really takes a lot of time.尽管如果您不喜欢制作自己的带注释的数据集，我不推荐这种方法，因为它确实需要很多时间。

You can use OpenCV for creating the 3D bbox by getting the specific values you'll need from the 2D bbox.您可以使用 OpenCV 通过从 2D bbox 获取您需要的特定值来创建 3D bbox。

But do take note that it will not provide you the best accuracy.但请注意，它不会为您提供最佳准确性。 It's still the best way to use Deep Learning instead for better accuracy.它仍然是使用深度学习来提高准确性的最佳方式。 You can find a lot of implementation of this in the net.你可以在网上找到很多这个实现。