Is my understanding of processors, cores, threads and parallelism correct?

Question

I've been trying to get a deep understanding of how these concepts relate. Let me give a simple example and explain what thinking so that you can correct it.

Let's say I want to try to sort two arrays

int[] A = { ... }; // very large, very unsorted 
int[] B = { ... }; // very large, very unsorted

by sorting each of them "as parallel as my system will allow me to sort them." I take advantage of the fact that a Parallel.ForEach does a lot of stuff under the hood, and I simply write

var arrays = new List<int[]>(A, B);
Paralell.ForEach(arrays, (arr) => { Array.Sort(arr); });

Now let's say I compile and run it on machines with the following specs:

1 processor, 1 core
1 processor, multiple cores
2 processors, at least one core on each

In case 1, there is absolutely no possibility of a performance gain. It sorts A, then sorts B, just like it would in a regular foreach loop.

In case 2, there is also no performance gain because unless you have multiple processors then your machine can not literally " do more than 1 thing at once. " Even if it ends up sorting them in different threads, the CPU that controls the threads does a little sorting of A, a little sorting of B, a little more of A, etc., which can't be more efficient than just sorting all of A and then all of B.

Case 3 is the only one with a possibility of a performance gain, for the reason mentioned in the previous case.

Can someone critique my understanding? How right or wrong is this? (I didn't major in computer science. So please grade me on a curve.)

Answer 1

In case 1... It sorts A, then sorts B

That is not how threading works. The OS rapidly context-switches between the two threads. On Windows that happens by default 64/3 times per second. The interleaving makes it look like A and B get sorted at the same time. Not otherwise easily observed, the debugger would have to give you a look inside Array.Sort(), it won't. Not otherwise faster of course, the slowdown is however fairly minor. It is the cheap kind of context switch, no need to reload the page mapping tables since the threads belong to the same process. You only pay for the possibly trashed cache, adding ~5 microseconds per 3/64 second (0.1% slower) is quite hard to measure accurately.

In case 2, ...then your machine can not literally "do more than 1 thing at once

It can, each core can execute Sort() concurrently. Largely the point of multi-core processors. They do however have to share a single resource, the memory bus. What matters a great deal is the size of the arrays and the speed of the RAM chips. Large arrays don't fit in the processor caches, it is technically possible for the memory bus to get saturated by requests from the processor cores. What does not help in this case is the element type, comparing two int values is very fast since it takes only a single CPU instruction. Expectation is for a x2 speed-up but if you observe it taking longer then you know that the RAM is the bottleneck.

Case 3 is the only one with a possibility of a performance gain

Not likely. Multiple processor machines often have a NUMA architecture, giving each processor its own memory bus. The interconnect between them might be used to shovel data from one bus to another. But such processors also have multiple cores. It is the OS' job to figure out how to use them effectively. And since the threads belong to same process, so share data, it will strongly favor scheduling the threads on the cores of the same processor and avoid putting a load on the interconnect. So expectation is that it will perform the same as case 2.

These are rough guidelines, modern machine design demands that you actually measure.

Is my understanding of processors, cores, threads and parallelism correct?

Question

1 answers

solution1
5 2016-11-12 06:45:15

Is my understanding of processors, cores, threads and parallelism correct?

Question

1 answers

solution1 5 2016-11-12 06:45:15

solution1
5 2016-11-12 06:45:15