简体繁体 English

人体姿势估计/匹配在智能手机上

[英]Human pose estimation/matching on smartphone

原文 2017-10-13 10:59:05 2 2 image-processing/ tensorflow/ computer-vision/ deep-learning/ openpose

Im working on a project where a person must mimic a predefined pose. 我在一个人必须模仿预定姿势的项目上工作。 A picture is made from the person that mimics this predefined pose. 从模仿该预定义姿势的人制作图片。 Then, the human pose of the person is extracted from this image and compared with the predefined pose. 然后，从该图像中提取人的人体姿势并与预定姿势进行比较。 Finally a scoring mechanism decides how well the two poses match or if they match at all. 最后，评分机制决定两个姿势的匹配程度，或者它们是否匹配。

I want to develop for smartphone, so ideally everything runs embedded on the smartphone itself. 我想为智能手机开发，所以理想情况下，所有内容都嵌入智能手机本身。 This means, the implementation is capable of running on CPU or smartphone GPU (example Moto G5 plus, Adreno 506 GPU on board -supports OpenGL-). 这意味着，该实现能够在CPU或智能手机GPU上运行（例如Moto G5 plus，板载Adreno 506 GPU - 支持OpenGL-）。 Working embedded is not a must, i think it's also possible to outsource the estimation/matching algorithm to a central server containing a decent GPU. 嵌入式工作不是必须的，我认为也可以将估算/匹配算法外包给包含不错GPU的中央服务器。 This particular choice, embedded or out-sourcing, is an issue that involves a lot of parameters (performance/computation power, server cost, accuracy, mobile battery usage, delay server communication, multi platform, scalability, mobile data usage -less important- , ...) 这种特殊的选择，嵌入式或外包，是一个涉及大量参数的问题（性能/计算能力，服务器成本，准确性，移动电池使用，延迟服务器通信，多平台，可扩展性，移动数据使用 - 无重要 - ，...）

I know there are some frameworks out there for human pose estimation, like Openpose and deepercut. 我知道有一些用于人体姿势估计的框架，比如Openpose和deepcut。 But as they all use deep learning, they require a descent GPU. 但由于他们都使用深度学习，他们需要下降GPU。 Most of the new smartphones these days have a GPU on board, but are they capable of running these frameworks? 如今，大多数新智能手机都装有GPU，但是它们能够运行这些框架吗？ To nuance for this case, the (multi-person) keypoint detection doesn't need to be realtime, as there is only 1 picture (no realtime video) and a delay time of 2 to 5 seconds is acceptable. 对于这种情况的细微差别，（多人）关键点检测不需要是实时的，因为只有1个图像（没有实时视频）并且可以接受2到5秒的延迟时间。

As I'm still in the research phase, I don't know what direction I should go. 由于我还处于研究阶段，我不知道应该走哪条路。 Is it even possible to port these frameworks to a smartphone platform? 甚至可以将这些框架移植到智能手机平台吗？ Like Openpose for example, which uses Caffe and OpenCV. 就像Openpose一样，它使用Caffe和OpenCV。 Let's say I want to port Openpose to Android; 假设我想将Openpose移植到Android; I know there is a library CNNdroid that is capable of converting CNN models made with Caffe to CNNdroid format. 我知道有一个CNNdroid库可以将用Caffe制作的CNN模型转换成CNNdroid格式。 Further OpenCV also shouldn't be a big problem as there is a Android version available. 进一步的OpenCV也不应该是一个大问题，因为有Android版本可用。 So, in theory it seems possible, but what in practice.. 所以，理论上似乎有可能，但实际上是什么......

My question is: Is there someone who has experience with human pose detection/matching on smartphone? 我的问题是：是否有人在智能手机上有人体姿势检测/匹配经验？ Is it even possible with the current GPU's available on smartphone. 智能手机上现有的GPU是否可行？ I know this is a broad question, but some directions/suggestions/experience could really help 我知道这是一个广泛的问题，但一些方向/建议/经验可能真的有帮助

UPDATE: I'm thinking about the option of porting Openpose (uses Caffe as ML framework) to TensorFlow. 更新：我正在考虑将Openpose（使用Caffe作为ML框架）移植到TensorFlow的选项。 TensorFlow supports both Android & iOS TensorFlow支持Android和iOS

2 个解决方案

You might be interested in looking at the techniques used by Krafka et al. 您可能有兴趣研究Krafka等人使用的技术。 for their Eye Tracking for Everyone project in which they compress a larger network for estimating gaze coordinates into a smaller network which can run on a smartphone. 他们的眼动追踪适用于所有人的项目，他们压缩一个更大的网络，用于估计凝视坐标到可以在智能手机上运行的较小网络。 This is using a concept developed by Geoff Hinton which he called Dark Knowledge . 这是使用Geoff Hinton开发的一个概念，他称之为Dark Knowledge 。 Gaze detection is a special case of pose estimation, so in principle it would seem like these techniques would be helpful. 注视检测是姿势估计的一个特例，因此原则上看起来这些技术会有所帮助。 However, I do not know whether they will be sufficiently effective for your purposes (I think that largely depends on your accuracy constraints). 但是，我不知道它们是否足以达到您的目的（我认为这在很大程度上取决于您的准确性限制）。

Openpose is too heavy for a smartphone application. 对于智能手机应用程序，Openpose太重了。 You need to redesign the software architecture to fit into a phone system. 您需要重新设计软件架构以适应电话系统。 Regarding to the cnn which is the bottleneck of the performance, using mobilenet like structure and Dark Knowledge (as Mozglubov mentioned) to teach a thinner network are two promising approaches. 关于作为性能瓶颈的cnn，使用类似移动网络的结构和Dark Knowledge（如Mozglubov所提到的）教授更薄的网络是两种有前途的方法。 Anywhere, there are lots of engineer work ahead. 在任何地方，都有很多工程师在工作。 Good luck! 祝好运！