简体繁体 English

JETSON TX2 上的 YoloV3 部署

[英]YoloV3 deployment on JETSON TX2

原文 2021-09-03 06:23:26 5 2 tensorflow/ deep-learning/ yolo/ nvidia-jetson/ tensorrt

I faced problem regarding Yolo object detection deployment on TX2.我在 TX2 上遇到了关于 Yolo 对象检测部署的问题。 I use pre-trained Yolo3 (trained on Coco dataset) to detect some limited objects (I mostly concern on five classes, not all the classes), The speed is low for real-time detection, and the accuracy is not perfect (but acceptable) on my laptop.我使用预训练的 Yolo3（在 Coco dataset 上训练）来检测一些有限的物体（我主要关注五个类，而不是所有类），实时检测速度低，准确度不完美（但可以接受） ) 在我的笔记本电脑上。 I'm thinking to make it faster by multithreading or multiprocessing on my laptop, is it possible for yolo?我正在考虑通过在笔记本电脑上进行多线程或多处理来使其更快，yolo 可以吗？ But my main problem is that algorithm is not running on raspberry pi and nvidia TX2.但我的主要问题是算法没有在 raspberry pi 和 nvidia TX2 上运行。

Here are my questions:以下是我的问题：

In general, is it possible to run yolov3 on TX2 without any modification like accelerators and model compression techniques?一般来说，是否可以在 TX2 上运行 yolov3 而无需任何修改，如加速器和模型压缩技术？
I cannot run the model on TX2.我无法在 TX2 上运行模型。 Firstly I got error regarding camera, so I decided to run the model on a video, this time I got the 'cannot allocate memory in static TLS block' error, what is the reason of getting this error?首先，我遇到了关于相机的错误，所以我决定在视频上运行模型，这次我收到了“无法在静态 TLS 块中分配内存”错误，出现此错误的原因是什么？ the model is too big.模型太大了。 It uses 16 GB GPU memory on my laptop.The GPU memory of raspberry and TX2 are less than 8GB.它在我的笔记本电脑上使用 16 GB GPU 内存。树莓和 TX2 的 GPU 内存小于 8GB。 As far as I know there are two solutions, using a smaller model or using tensor RT or pruning.据我所知，有两种解决方案，使用较小的模型或使用张量 RT 或修剪。 Do you have any idea if there is any other way?你知道有没有其他方法？
if I use tiny-yolo I will get lower accuracy and this is not what I want.如果我使用 tiny-yolo，我的准确度会降低，这不是我想要的。 Is there any way to run any object detection model with high performance for real-time in terms of both accuracy and speed (FPS) on raspberry pi or NVIDIA TX2?有什么方法可以在 raspberry pi 或 NVIDIA TX2 上实时运行任何在精度和速度 (FPS) 方面都具有高性能的对象检测模型？
If I clean the coco data for just the objects I concern and then train the same model, I would get higher accuracy and speed but the size would not change, Am I correct?如果我只清理我关注的对象的 coco 数据，然后训练相同的模型，我会获得更高的准确性和速度，但大小不会改变，我正确吗？
In general, what is the best model in terms of accuracy for real-time detection and what is the best in terms of speed?一般来说，就实时检测的准确性而言，什么是最好的模型，在速度方面什么是最好的？
How is Mobilenet?移动网络如何？ Is it better than YOLOs in terms of both accuracy and speed?它在准确性和速度方面都比 YOLO 更好吗？

2 个解决方案

1- Yes it is possible. 1- 是的，这是可能的。 I already run Yolov3 on Jetson Nano.我已经在 Jetson Nano 上运行 Yolov3。

2- It depends on model and input resolution of data. 2- 这取决于模型和数据的输入分辨率。 You can decrease input resolution.您可以降低输入分辨率。 Input images are transferred to GPU VRAM to use on model.输入图像被传输到 GPU VRAM 以在模型上使用。 Big input sizes can allocate much memory.大输入尺寸可以分配大量内存。 As far as I remember I have run normal Yolov3 on Jetson Nano(which is worse than tx2) 2 years ago.据我所知，两年前我在 Jetson Nano（比 tx2 差）上运行了正常的 Yolov3。 Also, you can use Yolov3-tiny and Tensorrt as you mention.此外，您可以使用 Yolov3-tiny 和 Tensorrt，正如您所提到的。 There are many sources on the web like this & this .网络上有很多像这样和这样的来源。

3- I suggest you to have a look at here . 3-我建议你看看这里。 In this repo, you can make transfer learning with your dataset & optimize the model with TensorRT & run it on Jetson.在这个 repo 中，你可以使用你的数据集进行迁移学习并使用 TensorRT 优化模型并在 Jetson 上运行它。

4- Size not dependent to dataset. 4- 大小不依赖于数据集。 It depend the model architecture(because it contains weights).它取决于模型架构（因为它包含权重）。 Speed probably does not change.速度可能不会改变。 Accuracy depends on your dataset.准确性取决于您的数据集。 It can be better or worser.它可以更好或更糟。 If any class on COCO is similiar to your dataset's any class, I suggest you to transfer learning.如果 COCO 上的任何课程与您的数据集的任何课程相似，我建议您进行迁移学习。

5- You have to find right model with small size, enough accuracy, gracefully speed. 5-你必须找到合适的模型，体积小，精度足够，速度优雅。 There is not best model.没有最好的模型。 There is best model for your case which depend on also your dataset.您的案例有最好的模型，这也取决于您的数据集。 You can compare some of the model's accuracy and fps here .您可以在此处比较模型的某些准确性和 fps。

6- Most people uses mobilenet as feature extractor. 6- 大多数人使用 mobilenet 作为特征提取器。 Read this paper.阅读这篇论文。 You will see Yolov3 have better accuracy, SSD with MobileNet backbone have better FPS.你会看到 Yolov3 有更好的准确性，带有 MobileNet 骨干网的 SSD 有更好的 FPS。 I suggest you to use jetson-inference repo.我建议您使用jetson-inference库。

By using jetson-inference repo, I get enough accuracy on SSD model & get 30 FPS.通过使用jetson-inference库，我在 SSD 模型上获得了足够的准确性并获得了 30 FPS。 Also, I suggest you to use MIPI-CSI camera on Jetson.另外，我建议您在 Jetson 上使用 MIPI-CSI 摄像头。 It is faster than USB cameras.它比USB相机更快。

I fixed the problem 1 and 2 only by replacing import order of the opencv and tensorflow inside the script.Now I can run Yolov3 without any modification on tx2.我仅通过替换脚本中的 opencv 和 tensorflow 的导入顺序来解决问题 1 和 2。现在我可以运行 Yolov3，而无需对 tx2 进行任何修改。 I got average FPS of 3.我的平均 FPS 为 3。