简体   繁体   中英

Context switch: what happens in a worst case scenario?

I want to understand how a certain worst case scenario of context switch happens. Say I have 10 CPU cores running a single process. Everything is CPU intensive, no thread is sleeping (waiting for I/O).

(I am mainly concerned with mainstream modern personal computer architectures and systems, typically x64 with Windows, Linux...)

Correct me if I'm wrong: running 10 CPU/RAM intensive independent threads is most often a near optimal situation. The amount of time spent in context switch is rather negligible. While the system may sometimes decide to re-attribute threads to different cores in a round-robin fashion causing a reset of RAM caches, it has a minor effect and works almost as if each thread was running on a single fixed core.

Only the main RAM bus may be a limitation since all threads share it, but it's not the point I'm interested in here. Reducing the number of threads will not increase the throughput anyway.

Now assume you still have 10 cores but run 1000 threads. The scheduler could theoretically decide to switch rarely (say every second) running 10 threads for a second, then 10 others... and the whole thing would still be close to optimal performance (throughput).

But it does not seem to be the case and it looks like threads are switched intensively causing a strongly suboptimal performance (throughput). Am I right about it? What is the main cause for this suboptimal performance? A few numbers would be nice if you have any idea of orders of magnitude of (for example): switches per second, performance loss caused by switching...

I'm going to answer my own question (after some search).

On windows, the number of context switches can be measured with performance counters: https://technet.microsoft.com/en-us/library/cc938606.aspx

I measured it on my machine (core i7/Windows 10) and the order of magnitude is around 1000/s by core when the number of running threads is more than the number of cores (and these threads are full CPU).

The time needed for a context switch varies quite a bit depending on:

  • what registers need to be saved
  • if FPU registers need to be saved
  • the processor model (of course)

You can read: https://www.quora.com/How-long-does-a-context-switch-take or http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

A slightly pessimistic avg. order of magnitude seems to be 1000 ns. Thus the total time for all context switches on each core is 1ms per second, that is 0.1%.

This does not depend on the number of threads: if you run 100 or 1000 threads, the number of switches does not change. As a conclusion the time spent in context switching is somehow negligible.

This reasoning is correct as long as the threads are pure CPU with only small memory read/write like a few local variables. I ran a test with full CPU threads and the difference between a few and 1000 threads is not noticeable.

But the situation changes when RAM is involved and switches makes CPU (memory) cache less efficient. A worse case is when:

  • computation can be split into 1000 independent "data" parts
  • each part of the data fits just into the memory cache (say L1 or L2) of a core
  • each part needs to be read many times

In this situation, running 10 threads to completion, then ten others... would take full advantage of the cache, while running 1000 threads at a time would causes the cache to be useful only during 1ms.

But if the data of several threads could fit into the cache, or if the threads read common data to some degree, or if each thread reads the data just once, then it is possible that running 1000 threads vs. running 10 threads a hundred times will have similar throughput.

It is more a matter a adapting parallelism to memory access. And it depends very much on the way memory needs to be accessed.

The time spend in context switching is negligible, the time lost because of wrong usage of caches may sometimes be problem, sometimes not, depending on how the memory is accessed and shared.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM