简体   繁体   English

线程关联掩码对当前线程有什么好处?

[英]What good are thread affinity mask changes for the current thread?

I'm writing a game engine and I need a way to get a precise and accurate "deltatime" value from which to derive the current FPS for debug and also to limit the framerate (this is important for our project). 我正在编写一个游戏引擎,我需要一种方法来获得精确和准确的“deltatime”值,从中获得当前用于调试的FPS并限制帧速率(这对我们的项目很重要)。

Doing a bit of research, I found out one of the best ways to do this is to use WinAPI's QueryPerformanceCounter function. 做了一些研究,我发现最好的方法之一就是使用WinAPI的QueryPerformanceCounter函数。 GetTicksCount has to be used to prevent forward counter leaps , but it in itself is not very accurate. 必须使用GetTicksCount来防止前向计数器飞跃 ,但它本身并不是非常准确。

Now, the problem with QueryPerformanceCounter is that it apparently may return values that would look like if time warped back (ie a call may return a value prior in time relative to another call in the past). 现在, QueryPerformanceCounter的问题在于它显然可能返回看起来像时间扭曲的值(即,调用可能在过去的时间内相对于另一个调用返回一个值)。 This happens only when a value obtained with a given processor core is compared against a value obtained with another processor core, which leads me to the ultimate questions that motivated me to make this post: 只有将使用给定处理器内核获得的值与使用另一个处理器内核获得的值进行比较时,才会发生这种情况,这导致我最终的问题促使我发布此帖子:

  1. May the OS "reallocate" a thread to another core while the thread is already running, or is a thread is allocated to a given core and that's that until the thread dies? 操作系统可以在线程已经运行时将一个线程“重新分配”到另一个核心,或者是一个线程被分配给一个给定的核心,直到该线程死亡为止?
  2. If a thread can't be reallocated (and that makes a lot of sense for me, at least), then why is it possible for me to do something like SetThreadAffinityMask(GetCurrentThread(),mask) ? 如果一个线程无法重新分配(至少这对我来说很有意义),为什么我可以做一些像SetThreadAffinityMask(GetCurrentThread(),mask)这样的事情? Ogre3D does that in its Ogre::Timer class (Windows implementation) , and I'm assuming that's to avoid time going back. Ogre3D在其Ogre :: Timer类(Windows实现)中 做到了这一点,我假设这是为了避免时间回归。 But for that to be true, then I would have to consider the possibility of threads being moved from one core to another arbitrarily by the OS, which seems rather odd to me (not sure why). 但为了实现这一点,那么我必须考虑操作系统任意将线程从一个核心移动到另一个核心的可能性,这对我来说似乎很奇怪(不确定原因)。

I think that was all I wanted to know for now. 我想这就是我现在想知道的。 Thanks. 谢谢。

Unless a thread has a processor affinity mask, the scheduler will move it from processor to processor in order to give it execution time. 除非线程具有处理器关联掩码,否则调度程序会将其从处理器移动到处理器以便为其提供执行时间。 Since moving a thread between processors costs performance, it will try not to move it, but giving it a processor to execute on has priority over not moving it. 由于在处理器之间移动线程会降低性能,因此它会尝试不移动它,但给它执行的处理器优先于不移动它。 So, usually threads move. 所以,通常是线程移动。

As for timer apis. 至于计时器apis。 timeGetTime is designed for multimedia timing, so it's a bit more accurate than GetTickCount . timeGetTime专为多媒体计时而设计,因此比GetTickCount更准确。

QueryPerformanceCounter(). is still your most precise measurement though. 仍然是你最精确的衡量标准。 Microsoft has this to say about it. 微软有这个说法。

On a multiprocessor computer, it should not matter which processor is called. 在多处理器计算机上,调用哪个处理器无关紧要。 However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). 但是,由于基本输入/输出系统(BIOS)或硬件抽象层(HAL)中的错误,您可以在不同的处理器上获得不同的结果。 To specify processor affinity for a thread, use the SetThreadAffinityMask function. 要指定线程的处理器关联,请使用SetThreadAffinityMask函数。

So if you are doing the timing tests on a specific computer, you may not have to worry about QPC going backwards, you should do some testing and see if it matters on your machine. 因此,如果您正在特定计算机上进行计时测试,您可能不必担心QPC会倒退,您应该进行一些测试,看看它是否对您的计算机有影响。

Threads can be, and are (unless they have an affinity set) reallocated while the thread is running. 线程可以是,并且在线程运行时重新分配(除非它们具有关联集)。 Windows spreads the load over all the processors to maximize performance. Windows将负载分散到所有处理器上以最大限度地提高性能。

Even if you lock the thread to one processor using SetAffinityMask, QPC can run backwards if you're really unlucky and the hardware sucks. 即使您使用SetAffinityMask将线程锁定到一个处理器,如果您真的不走运并且硬件很糟糕,QPC也可以向后运行。 Better to just deal with the case of QPC returning bad values. 最好只处理QPC返回错误值的情况。 In Windows 7, QPC has been significantly improved in this regard, but since you're writing a game you're probably targeting XP where it won't help you. 在Windows 7中,QPC在这方面有了显着的改进,但是因为你正在编写一款游戏,你可能会以XP为目标而无法帮助你。

Also, don't set the thread affinity, you can deadlock yourself, introduce weird timing and perf bugs, and generally cause yourself grief. 另外,不要设置线程亲和力,你可以自己陷入僵局,引入奇怪的时机和执行错误,并且通常会让自己感到悲伤。

1) A thread may allocate a thread to whichever core has spare processing time. 1)线程可以将线程分配给具有备用处理时间的核心。 This will be why you will often see software using 50% on a quad core machine yet when you check the graphs its using half of all four. 这就是为什么你经常会在四核机器上看到使用50%的软件的原因,当你使用所有四个中的一半检查图表时。

2) See 1 ;) 2)见1;)

Using SetThreadAffinity() is usually a bad idea except in the case where the thread only does timing. 使用SetThreadAffinity()通常是一个坏主意,除非线程只进行计时。 If you lock your thread to a single core, you remove all the benefit of having a multicore system in the first place. 如果将线程锁定到单个核心,则可以首先消除拥有多核系统的所有好处。 Your application can no longer scale. 您的应用程序无法再扩展。 Even if you launch multiple instances of your app, they will still be locked to a single core. 即使您启动应用程序的多个实例,它们仍将锁定到单个核心。

We typically have to lock our game into a single thread when running timings because of this; 因此,我们通常必须在运行时间时将游戏锁定到单个线程中; there's no effective way around that we've found since you need submicrosecond resolution when measuring perf. 我们发现没有有效的方法,因为在测量性能时需要亚微秒级分辨率。

One thing that makes it a little easier is that our engine is cut up into broad components that always run concurrently ( eg game/logic "server", input/graphics "client", audio, render are each their own thread), so what we do is lock each of those threads onto its own core and time them independently. 让它变得更容易的一件事是我们的引擎被切割成总是同时运行的广泛组件( 例如游戏/逻辑“服务器”,输入/图形“客户端”,音频,渲染都是他们自己的线程),那么什么我们将每个线程锁定到自己的核心并独立计时。

Similarly, because we know that eg the render loop is always going to be on core 0, we use that for timing framerate. 类似地,因为我们知道例如渲染循环总是在核心0上,我们使用它来计算帧速率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM