简体   繁体   中英

What good are thread affinity mask changes for the current thread?

I'm writing a game engine and I need a way to get a precise and accurate "deltatime" value from which to derive the current FPS for debug and also to limit the framerate (this is important for our project).

Doing a bit of research, I found out one of the best ways to do this is to use WinAPI's QueryPerformanceCounter function. GetTicksCount has to be used to prevent forward counter leaps , but it in itself is not very accurate.

Now, the problem with QueryPerformanceCounter is that it apparently may return values that would look like if time warped back (ie a call may return a value prior in time relative to another call in the past). This happens only when a value obtained with a given processor core is compared against a value obtained with another processor core, which leads me to the ultimate questions that motivated me to make this post:

  1. May the OS "reallocate" a thread to another core while the thread is already running, or is a thread is allocated to a given core and that's that until the thread dies?
  2. If a thread can't be reallocated (and that makes a lot of sense for me, at least), then why is it possible for me to do something like SetThreadAffinityMask(GetCurrentThread(),mask) ? Ogre3D does that in its Ogre::Timer class (Windows implementation) , and I'm assuming that's to avoid time going back. But for that to be true, then I would have to consider the possibility of threads being moved from one core to another arbitrarily by the OS, which seems rather odd to me (not sure why).

I think that was all I wanted to know for now. Thanks.

Unless a thread has a processor affinity mask, the scheduler will move it from processor to processor in order to give it execution time. Since moving a thread between processors costs performance, it will try not to move it, but giving it a processor to execute on has priority over not moving it. So, usually threads move.

As for timer apis. timeGetTime is designed for multimedia timing, so it's a bit more accurate than GetTickCount .

QueryPerformanceCounter(). is still your most precise measurement though. Microsoft has this to say about it.

On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). To specify processor affinity for a thread, use the SetThreadAffinityMask function.

So if you are doing the timing tests on a specific computer, you may not have to worry about QPC going backwards, you should do some testing and see if it matters on your machine.

Threads can be, and are (unless they have an affinity set) reallocated while the thread is running. Windows spreads the load over all the processors to maximize performance.

Even if you lock the thread to one processor using SetAffinityMask, QPC can run backwards if you're really unlucky and the hardware sucks. Better to just deal with the case of QPC returning bad values. In Windows 7, QPC has been significantly improved in this regard, but since you're writing a game you're probably targeting XP where it won't help you.

Also, don't set the thread affinity, you can deadlock yourself, introduce weird timing and perf bugs, and generally cause yourself grief.

1) A thread may allocate a thread to whichever core has spare processing time. This will be why you will often see software using 50% on a quad core machine yet when you check the graphs its using half of all four.

2) See 1 ;)

Using SetThreadAffinity() is usually a bad idea except in the case where the thread only does timing. If you lock your thread to a single core, you remove all the benefit of having a multicore system in the first place. Your application can no longer scale. Even if you launch multiple instances of your app, they will still be locked to a single core.

We typically have to lock our game into a single thread when running timings because of this; there's no effective way around that we've found since you need submicrosecond resolution when measuring perf.

One thing that makes it a little easier is that our engine is cut up into broad components that always run concurrently ( eg game/logic "server", input/graphics "client", audio, render are each their own thread), so what we do is lock each of those threads onto its own core and time them independently.

Similarly, because we know that eg the render loop is always going to be on core 0, we use that for timing framerate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM