简体繁体中英

Can CUDA cores run things absolutely parallel or do they need context switching?

原文 2022-09-23 19:01:31 1 1 parallel-processing/ cuda/ gpu/ core

Can a CUDA INT32 Core process two different integer instructions completelly parallel, without context switching? I know that it is not possible on a CPU, but on a NVIDIA GPU? I know that a SM can run warps, and if core has to wait for some information, then a it gets another thread from the dispatch unit.

1 answers

I know that it is not possible on a CPU, but on a NVIDIA GPU?

This assertion is wrong on modern mainstream CPUs (eg. since at least a decade for nearly all x86-64 processors, starting from Intel Skylake or AMD Zen 2). Indeed, modern x86-64 Intel/AMD processor can generally compute 2 (256 AVX) SIMD vectors in parallel since there is generally 2 SIMD units. Processors like Intel Skylake also have 4 ALU units capable of computing 4 basic arithmetic operations (eg. add, sub, and, xor) in parallel per cycle. Some instruction like division are far more expensive and do not run in parallel on such architecture though it is well pipelined. The instructions can come from the same thread on the same logical cores or possibly 2 threads (of possibly 2 different processes) scheduled on 2 logical cores without any context switches. Note that recent high-end ARM processors can also do this (even some mobile processors).

Can a CUDA INT32 Core process two different integer instructions completelly parallel, without context switching?

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Thus, 1 instruction operate on 32 items in parallel (though, theoretically, an hardware can be free not to do that completely in parallel). A kernel execution basically contains many block and blocks are scheduled to SM. An SM can operate on many blocks concurrently so there is a massive amount of parallelism available.

Whether a specific GPU can execute two INT32 warp in parallel it is dependent of the target architecture , not CUDA itself. On modern Nvidia GPUs, each SM can be split in multiple partitions that can each execute instructions on blocks independently of the other partitions. For example, AFAIK, on a Pascal GP104, there is 20 SM and each SM has 4 partition capable of running SIMD instructions operating on 1 warp (32 items) at time. In practice, things can be a bit more complex on newer architectures. You can get more information here .

CUDA: do I need different streams on multiple GPUs to execute in parallel?

Context Switching vs Parallel Execution

How can this combination algorithm be modified to run in parallel on a cuda enabled gpu?

How do I run MATLAB scripts in parallel (i.e. on multiple cores and without using parfor)?

Threads can run on different processors or cores for both Task.Factory.StartNew and Parallel.Invoke

Using multiple cores to run latin hypercube sampling in parallel

Run multiple R scripts in parallel using foreach and controlling number of cores

SciPy programming on CUDA cores

parallel computing in multiple cores for data which is indepedently run with the program

How to measure if a program was run in parallel over multiple cores in Linux?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question CUDA: do I need different streams on multiple GPUs to execute in parallel? Context Switching vs Parallel Execution How can this combination algorithm be modified to run in parallel on a cuda enabled gpu? How do I run MATLAB scripts in parallel (i.e. on multiple cores and without using parfor)? Threads can run on different processors or cores for both Task.Factory.StartNew and Parallel.Invoke Using multiple cores to run latin hypercube sampling in parallel Run multiple R scripts in parallel using foreach and controlling number of cores SciPy programming on CUDA cores parallel computing in multiple cores for data which is indepedently run with the program How to measure if a program was run in parallel over multiple cores in Linux?

Related Tags

Can CUDA cores run things absolutely parallel or do they need context switching?

Question

1 answers

solution1 1 2022-09-23 19:38:19

solution1
1 2022-09-23 19:38:19