简体繁体 English

使用CUDA实现，python（pycuda）或C ++处理图像？

[英]processing an image using CUDA implementation, python (pycuda) or C++?

原文 2011-02-11 15:08:02 8 4 c++/ python/ cuda/ pycuda

I am in a project to process an image using CUDA. 我正在使用CUDA处理图像的项目。 The project is simply an addition or subtraction of the image. 该项目只是图像的加法或减法。

May I ask your professional opinion, which is best and what would be the advantages and disadvantages of those two? 我可以问你的专业意见，这是最好的，这两者的优点和缺点是什么？

I appreciate everyone's opinions and/or suggestions since this project is very important to me. 我感谢大家的意见和/或建议，因为这个项目对我来说非常重要。

4 个解决方案

General answer: It doesn't matter. 一般答案：没关系。 Use the language you're more comfortable with. 使用您更熟悉的语言。

Keep in mind, however, that pycuda is only a wrapper around the CUDA C interface, so it may not always be up-to-date, also it adds another potential source of bugs, … 但请记住，pycuda只是CUDA C接口的包装器，所以它可能并不总是最新的，它还增加了另一个潜在的bug来源，......

Python is great at rapid prototyping, so I'd personally go for Python. Python非常适合快速原型设计，因此我个人会选择Python。 You can always switch to C++ later if you need to. 如果需要，您可以稍后切换到C ++。

If the rest of your pipeline is in Python, and you're using Numpy already to speed things up, pyCUDA is a good complement to accelerate expensive operations. 如果你的管道的其余部分都在Python中，并且你已经在使用Numpy来加快速度，那么pyCUDA是加速昂贵操作的一个很好的补充。 However, depending on the size of your images and your program flow, you might not get too much of a speedup using pyCUDA. 但是，根据图像的大小和程序流程，使用pyCUDA可能无法获得太多的加速。 There is latency involved in passing the data back and forth across the PCI bus that is only made up for with large data sizes. 在PCI总线上来回传递数据所涉及的延迟只是用于大数据量。

In your case (addition and subtraction), there are built-in operations in pyCUDA that you can use to your advantage. 在你的情况下（加法和减法），pyCUDA中有内置的操作，你可以利用它们。 However, in my experience, using pyCUDA for something non-trivial requires knowing a lot about how CUDA works in the first place. 但是，根据我的经验，将pyCUDA用于非平凡的事情需要了解CUDA如何工作。 For someone starting from no CUDA knowledge, pyCUDA might be a steep learning curve. 对于没有CUDA知识的人来说，pyCUDA可能是一个陡峭的学习曲线。

Take a look at openCV , it contains a lot of image processing functions and all the helpers to load/save/display images and operate cameras. 看看openCV ，它包含很多图像处理功能和所有帮助器来加载/保存/显示图像和操作摄像头。

It also now supports CUDA, some of the image processing functions have been reimplemented in CUDA and it gives you a good framework to do your own. 它现在也支持CUDA，一些图像处理功能已在CUDA中重新实现，它为您提供了一个自己做的好框架。

Alex's answer is right. 亚历克斯的答案是对的。 The amount of time consumed in the wrapper is minimal. 包装器中消耗的时间量是最小的。 Note that PyCUDA has some nice metaprogramming constructs for generating kernels which might be useful. 请注意，PyCUDA有一些很好的元编程构造，用于生成可能有用的内核。

If all you're doing is adding or subtracting elements of an image, you probably shouldn't use CUDA for this at all. 如果你所做的只是添加或减去图像的元素，你可能根本就不应该使用CUDA。 The amount of time it takes to transfer back and forth across the PCI-E bus will dwarf the amount of savings you get from parallelism. 在PCI-E总线上来回传输所需的时间将使从并行性中获得的节省量相形见绌。

Any time you deal with CUDA, it's useful to think about the CGMA ratio (computation to global memory access ratio). 无论何时处理CUDA，考虑CGMA比率（计算到全局内存访问比率）都很有用。 Your addition/subtraction is only 1 float point operation for 2 memory accesses (1 read and 1 write). 对于2次存储器访问（1次读取和1次写入），加法/减法只有1次浮点运算。 This ends up being very lousy from a CUDA perspective. 从CUDA的角度来看，这最终会变得非常糟糕。