简体繁体中英

Why do GPU based algorithms perform faster

原文 2012-02-11 08:48:38 6 3 cuda/ gpgpu/ nvidia

I just implemented an algorithm on the GPU that computes the difference btw consecutive indices of an array. I compared it with a CPU based implementation and noticed that for large sized array, the GPU based implementation performs faster.

I am curious WHY does the GPU based implementation perform faster. Please note that i know the surface reasoning that a GPU has several cores and can thus do the operation is parallel ie, instead of visiting each index sequentially, we can assign a thread to compute the difference for each index.

But can someone tell me a deeper reason as to why GPU's perform faster. What is so different about their architecture that they can beat a CPU based implementation

3 answers

They don't perform faster, generally.

The point is: Some algorithms fit better into a CPU, some fit better into a GPU.

The execution model of GPUs differs (see SIMD), the memory model differs, the instruction set differs... The whole architecture is different.

There are no obvious way to compare a CPU versus a GPU. You can only discuss whether (and why) the CPU implementation A of an algorithm is faster or slower than a GPU implementation B of this algorithm.

This ended up kind of vague, so a tip of an iceberg of concrete reasons would be: The strong side of CPU is random memory access, branch prediction, etc. GPU excels when there's a high amount of computation with high data locality, so that your implementation can achieve a nice ratio of compute-to-memory-access. SIMD makes GPU implementations slower than CPU where there's a lot of unpredictable braching to many code paths, for example.

The real reason is that a GPU not only has several cores, but it has many cores , typically hundreds of them! Each GPU core however is much slower than a low-end CPU.

But the programming mode is not at all like multi-cores CPUs. So most programs cannot be ported to or take benefit from GPUs.

While some answers have already been given here and this is an old thread, I just thought I'd add this for posterity and what not:

The main reason that CPU's and GPU's differ in performance so much for certain problems is design decisions made on how to allocate the chip's resources. CPU's devote much of their chip space to large caches, instruction decoders, peripheral and system management, etc. Their cores are much more complicated and run at much higher clock rates (which produces more heat per core that must be dissipated.) By contrast, GPU's devote their chip space to packing as many floating-point ALU's on the chip as they can possibly get away with. The original purpose of GPU's was to multiply matricies as fast as possible (because that is the primary type of computation involved in graphics rendering.) Since matrix multiplication is an embarrasingly parallel problem (eg each output value is computed completely independently of every other output value) and the code path for each of those computations is identical, chip space can be saved by having several ALU's follow the instructions decoded by a single instruction decoder, since they're all performing the same operations at the same time. By contrast, each of a CPU's cores must have its own separate instruction decoder since the cores are not following identical code paths, which makes each of a CPU's cores much larger on the die than a GPU's cores. Since the primary computations performed in matrix multiplication are floating-point multiplication and floating-point addition, GPU's are implemented such that each of these are single-cycle operations and, in fact, even contain a fused multiply-and-add instruction that multiplies two numbers and adds the result to a third number in a single cycle. This is much faster than a typical CPU, where floating-point multiplication is often a many-cycle operation. Again, the trade-off here is that the chip space is devoted to the floating-point math hardware and other instructions (such as control flow) are often much slower per core than on a CPU or sometimes even just don't exist on a GPU at all.

Also, since GPU cores run at much lower clock rates than typical CPU cores and don't contain as much complicated circuitry, they don't produce as much heat per core (or use as much power per core.) This allows more of them to be packed into the same space without overheating the chip and also allows a GPU with 1,000+ cores to have similar power and cooling requirements to a CPU with only 4 or 8 cores.

Why is the cpu faster than the gpu for small inputs?

Graph algorithms on GPU

Problem with response times in CUDA why cpu is faster than gpu?

Why is my inclusive scan code 2x faster on CPU than on a GPU?

How to perform basic operations (+ - * /) on GPU and store the result on it

Is there something like Hadoop, but based on GPU?

is there a better and a faster way to copy from CPU memory to GPU using thrust?

GPU benchmark NAMD runs faster on VM than navie execution

Why do I have 200 MB GPU usage even when I only created 1 byte data?

How to perform relational join on two data containers on GPU (preferably CUDA)?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why is the cpu faster than the gpu for small inputs? Graph algorithms on GPU Problem with response times in CUDA why cpu is faster than gpu? Why is my inclusive scan code 2x faster on CPU than on a GPU? How to perform basic operations (+ - * /) on GPU and store the result on it Is there something like Hadoop, but based on GPU? is there a better and a faster way to copy from CPU memory to GPU using thrust? GPU benchmark NAMD runs faster on VM than navie execution Why do I have 200 MB GPU usage even when I only created 1 byte data? How to perform relational join on two data containers on GPU (preferably CUDA)?

Related Tags

Why do GPU based algorithms perform faster

Question

3 answers

solution1
5 ACCPTED 2012-02-11 09:09:38

solution2
4 2012-02-11 08:52:57

solution3
2 2012-12-18 19:34:48

Why do GPU based algorithms perform faster

Question

3 answers

solution1 5 ACCPTED 2012-02-11 09:09:38

solution2 4 2012-02-11 08:52:57

solution3 2 2012-12-18 19:34:48

solution1
5 ACCPTED 2012-02-11 09:09:38

solution2
4 2012-02-11 08:52:57

solution3
2 2012-12-18 19:34:48