简体   繁体   中英

What is the overhead associated with std::condition_variable_any

I have read in many places that there is some overhead associated with std::condition_variable_any . Just wondering, what is this overhead?

My guess here is that since this is a generic condition variable that can work with any type of lock, it requires a manually rolled implementation of waiting (perhaps with another condition_variable and mutex or futex, or something similar) so the extra overhead probably comes from that? But not sure... As opposed to just being a native wrapper around pthread_cond_wait() (and equivalent on other systems) etc.


As a followup, if I was say implementing something that waits on, say, a shared mutex, then is this type of condition variable a bad choice because of the performance overhead? What else can I do in this situation?

pthread_cond_wait() / SleepConditionVariableSRW() , same as the the plain std::condition_variable::wait() require just a single, atomic syscall for both releasing the mutex, waiting for the condition variable and re-aquiring the mutex. The thread immediately goes to sleep and another thread - ideally one which was blocked by the mutex - can take over immediately on the same core.

With std::condition_variable_any , the unlock of the passed BasicLockable and starting to wait on the native event / condition is more than just a single syscall, it's invoking the unlock() method on the BasicLockable first and only then issues the syscall for waiting. So you have at least the overhead from the separate unlock() , plus you are more likely to trigger an less than ideal scheduling decision on the OS side. Worst case, the unlock even caused continuation of a waiting thread on a different core, with all the associated overhead.

The other way around, eg on spurious wakes, there are also OS side scheduling optimizations possible when dealing with a native mutex (as used in std::mutex ) which don't apply with a generic BasicLockable .

Both involve some book keeping, in order to provide notify_all() logic (it's actually one event / condition per waiting thread) as well as the guarantees about all methods being atomic, so they both come with a small overhead anyway.

The real overhead comes from how well the OS can make a good scheduling decision on the combined signal-and-wait-and-lock syscall. If the OS isn't smart about the scheduling, then it makes virtually no difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM