简体   繁体   English

为什么pthread mutex被认为比futex“慢”?

[英]Why is a pthread mutex considered “slower” than a futex?

Why are POSIX mutexes considered heavier or slower than futexes? 为什么POSIX互斥量被认为比futex更重或更慢? Where is the overhead coming from in the pthread mutex type? pthread互斥锁类型的开销来自哪里? I've heard that pthread mutexes are based on futexes, and when uncontested, do not make any calls into the kernel. 我听说pthread互斥体基于互斥体,当无争议时,不要对内核进行任何调用。 It seems then that a pthread mutex is merely a "wrapper" around a futex. 那时似乎pthread互斥体只是一个围绕futex的“包装器”。

Is the overhead simply in the function-wrapper call and the need for the mutex function to "setup" the futex (ie, basically the setup of the stack for the pthread mutex function call)? 开销只是在函数包装调用中,并且需要互斥函数来“设置”futex(即,基本上是pthread互斥函数调用的堆栈设置)? Or are there some extra memory barrier steps taking place with the pthread mutex? 或者pthread互斥锁是否有一些额外的内存屏障步骤?

Futexes were created to improve the performance of pthread mutexes. 创建Futex是为了提高pthread互斥体的性能。 NPTL uses futexes, LinuxThreads predated futexes, which I think is where the "slower" consideration comes. NPTL使用futexes,LinuxThreads早于futexes,我认为这是“慢”考虑的地方。 NPTL mutexes may have some additional overhead, but it shouldn't be much. NPTL互斥量可能会有一些额外的开销,但它不应该太多。

Edit: The actual overhead basically consists on: 编辑:实际开销主要包括:

  • selecting the correct algorithm for the mutex type (normal, recursive, adaptive, error-checking; normal, robust, priority-inheritance, priority-protected), where the code heavily hints to the compiler that we are likely using a normal mutex (so it should convey that to the CPU's branch prediction logic), 为互斥类型选择正确的算法(正常,递归,自适应,错误检查;正常,健壮,优先级继承,优先级保护),其中代码向编译器提示我们可能正在使用普通互斥锁(所以它应该传达给CPU的分支预测逻辑),
  • and a write of the current owner of the mutex if we manage to take it which should normally be fast, since it resides in the same cache-line as the actual lock which we have just taken, unless the lock is heavily contended and some other CPU accessed the lock between the time we took it and when we attempted to write the owner (this write is unneeded for normal mutexes, but needed for error-checking and recursive mutexes). 如果我们设法接受通常应该快速的互斥锁,那么写入互斥锁的当前所有者,因为它与我们刚刚采用的实际锁定位于同一个缓存行中,除非锁定严重争用和其他一些锁定CPU在我们接受它的时间和我们尝试写入所有者之间访问锁定(对于普通的互斥锁,这种写入是不必要的,但是需要用于错误检查和递归互斥锁)。

So, a few cycles (typical case) to a few cycles + a branch misprediction + an additional cache miss (very worst case). 因此,几个周期(典型情况)到几个周期+分支错误预测+额外的高速缓存未命中(非常坏的情况)。

The short answer to your question is that futexes are known to be implemented about as efficiently as possible, while a pthread mutex may or may not be. 对你的问题的简短回答是,已知futexes尽可能高效地实现,而pthread互斥量可能是也可能不是。 At minimum, a pthread mutex has overhead associated with determining the type of mutex and futexes do not. 至少,pthread互斥锁具有与确定互斥锁类型相关的开销,而互斥锁则没有。 So a futex will almost always be at least as efficient as a pthread mutex, until and unless someone thinks up some structure lighter than a futex and then releases a pthreads implementation that uses that for its default mutex. 因此,futex几乎总是至少与pthread互斥锁一样有效,除非有人认为某些结构比futex轻,然后释放一个pthreads实现,将其用于默认的互斥锁。

Technically speaking pthread mutexes are not slower or faster than futexes. 从技术上讲,pthread互斥量并不比futex更慢或更快。 pthread is just a standard API, so whether they are slow or fast depends on the implementation of that API . pthread只是一个标准API,因此它们是慢还是快取决于该API实现

Specifically in Linux pthread mutexes are implemented as futexes and are therefore fast. 特别是在Linux中,pthread互斥体被实现为futexes,因此速度很快。 Actually, you don't want to use the futex API itself as it is very hard to use, does not have the appropriate wrapper functions in glibc and requires coding in assembly which would be non portable. 实际上,你不想使用futex API本身,因为它很难使用,在glibc中没有合适的包装函数,并且需要在汇编中进行编码,这是不可移植的。 Fortunately for us the glibc maintainers already coded all of this for us under the hood of the pthread mutex API. 幸运的是,对于我们来说,glibc维护者已经在pthread互斥API的引擎下为我们编写了所有这些代码。

Now, because most operating systems did not implement futexes then programmers usually mean by pthread mutex is the performance you get from usual implementation of pthread mutexes, which is, slower. 现在,因为大多数操作系统都没有实现futexes,所以程序员通常用pthread mutex来表示你从pthread互斥体的常规实现中获得的性能,这是较慢的。

So it's a statistical fact that in most operating systems that are POSIX compliant the pthread mutex is implemented in kernel space and is slower than a futex. 因此,统计事实是,在大多数符合POSIX标准的操作系统中,pthread互斥体在内核空间中实现,并且比futex慢。 In Linux they have the same performance. 在Linux中,它们具有相同的性能。 It could be that there are other operating systems where pthread mutexes are implemented in user space (in the uncontended case) and therefore have better performance but I am only aware of Linux at this point. 可能有其他操作系统在用户空间中实现了pthread互斥(在非竞争情况下),因此具有更好的性能,但我现在只知道Linux。

Because they stay in userspace as much as possible, which means they require fewer system calls, which is inherently faster because the context switch between user and kernel mode is expensive. 因为它们尽可能地留在用户空间中,这意味着它们需要更少的系统调用,这本身就更快,因为用户和内核模式之间的上下文切换是昂贵的。

I assume you're talking about kernel threads when you talk about POSIX threads. 当你谈论POSIX线程时,我假设你正在谈论内核线程。 It's entirely possible to have an entirely userspace implementation of POSIX threads which require no system calls but have other issues of their own. 完全可能有一个POSIX线程的完全用户空间实现,它不需要系统调用,但有自己的其他问题。

My understanding is that a futex is halfway between a kernel POSIX thread and a userspace POSIX thread. 我的理解是,futex位于内核POSIX线程和用户空间POSIX线程之间。

On AMD64 a futex is 4 bytes, while a NPTL pthread_mutex_t is 56 bytes! 在AMD64上,futex是4个字节,而NPTL pthread_mutex_t是56个字节! Yes, there is a significant overhead. 是的,有很大的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM