简体   繁体   English

pthread_mutex_lock / unlock的性能

[英]Performance of pthread_mutex_lock/unlock

I've noticed that I take a pretty big performance hit when I have an algorithm that locks and unlocks a thread ALOT. 我注意到,当我有一个锁定和解锁线程ALOT的算法时,我的性能得到了很大的提升。

Is there any way to help this overhead? 有没有办法帮助这个开销? Would using a semaphore be more/less efficient? 使用信号量会更多/更低效吗?

Thanks 谢谢

typedef struct _treenode{
   struct _treenode *leftNode;
   struct _treenode *rightNode;
   int32_t data;
   pthread_mutex_t mutex;
}TreeNode;

pthread_mutex_t _initMutex = PTHREAD_MUTEX_INITIALIZER;

int32_t insertNode(TreeNode **_trunk, int32_t data){
   TreeNode **current;
   pthread_mutex_t *parentMutex = NULL, *currentMutex = &_initMutex;

   if(_trunk != NULL){
      current = _trunk;
      while(*current != NULL){
         pthread_mutex_lock(&(*current)->mutex);
         currentMutex = &(*current)->mutex;
         if((*current)->data < data){
            if(parentMutex != NULL)
               pthread_mutex_unlock(parentMutex);
            pthreadMutex = currentMutex;
            current = &(*current)->rightNode;
         }else if((*current)->data > data){
            if(parentMutex != NULL)
               pthread_mutex_unlock(parentMutex);
            parentMutex = currentMutex;
            current = &(*current)->leftNode;
         }else{
            pthread_mutex_unlock(currentMutex);
            if(parentMutex != NULL)
               pthread_mutex_unlock(parentMutex);
            return 0;
         }
      }
      *current = malloc(sizeof(TreeNode));
      pthread_mutex_init(&(*current)->mutex, NULL);
      pthread_mutex_lock(&(*current)->mutex);
      (*current)->leftNode = NULL;
      (*current)->rightNode = NULL;
      (*current)->data = data;
      pthread_mutex_unlock(&(*current)->mutex);
      pthread_mutex_unlock(currentMutex);
   }else{
      return 1;
   }
   return 0;
}

int main(){
   int i;
   TreeNode *trunk = NULL;
   for(i=0; i<1000000; i++){
      insertNode(&trunk, rand() % 50000);
   }
}

Instead of worrying about the blades of grass, step back and observe the whole forest. 而不是担心草叶,退后一步观察整个森林。

Any algorithm which depends on two threads potentially closely stepping on each each other's toes is inherently inefficient. 依赖于两个线程的任何算法可能紧密地踩在彼此的脚趾上本身就是低效的。 Try to find a way to drastically reduce the need for interaction. 尝试找到一种大幅减少交互需求的方法。

For example, if one thread produces data and the other consumes it, one can easily think up an inefficient algorithm where the producer publishes the data in shared memory and then waits for the other to consume it. 例如,如果一个线程产生数据而另一个线程消耗它,则可以很容易地想出一个效率低下的算法,其中生产者在共享内存中发布数据然后等待另一个消费它。 Meanwhile the consumer is waiting for the producer to finish, etc., etc. This is all much simplified by the producer writing into a file or pipe, and the consumer reading from it. 与此同时,消费者正在等待生产者完成等等等。生产者写入文件或管道以及消费者从中读取消息时,这一切都大大简化了。

pthread_mutex_lock and pthread_mutex_unlock vary in cost depending on contention: pthread_mutex_lockpthread_mutex_unlock的成本因竞争而异:

  1. Single thread use - either only one thread exists, or only one thread is using the mutex and the resource it protects: locking is virtually free , perhaps 80-100 cycles at most. 单线程使用 - 只存在一个线程,或者只有一个线程正在使用互斥锁及其保护的资源:锁定几乎是免费的 ,最多可能是80-100个周期。
  2. Multiple threads using the resource, but locks are held for very short intervals and contention is rare: locking has some cost, and it's hard to measure; 使用该资源的多个线程,但锁定的间隔非常短,争用很少:锁定有一些成本,而且很难测量; the cost consists mostly of invalidating other cores'/cpus' cache lines. 成本主要包括使其他核心/ cpus缓存行无效。
  3. Significant lock contention: nearly every lock and unlock operation will require assistance from the kernel, and the cost is easily several thousand (possibly even tens of thousand) cycles per lock/unlock. 显着的锁定争用:几乎每个锁定和解锁操作都需要内核的帮助,并且每次锁定/解锁的成本很容易达到几千(甚至几万)个周期。

Still, mutexes should be the least expensive locking primitive in most situations and on most implementations. 在大多数情况下和大多数实现中,互斥锁应该是最便宜的锁定原语。 Occasionally spinlocks may perform better. 偶尔螺旋锁可能表现更好。 I would never expect semaphores to perform better. 我永远不会期望信号量表现更好。

As far as I can see your lock strategy is not optimal since most of the locks will not be taken to change the data, but just to read and find the way through the tree. 据我所知,您的锁定策略并不是最优的,因为大多数锁定都不会用于更改数据,而只是读取并找到通过树的方式。

pthread_rwlock_t could help on this. pthread_rwlock_t 可以帮助解决这个问题。 You'd only take read-locks on the path down in the tree until you hit a node where you want to do some modification. 您只需在树中的路径上读取锁定,直到您到达要进行某些修改的节点。 There you would then take a write-lock. 然后你会在那里进行写锁定。 By that you could have other threads perform the same task when walking down the tree in a different branch without disturbing each other. 通过这种方式,您可以让其他线程在不同分支中沿着树行走时执行相同的任务,而不会相互干扰。

A decent implementation of pthread_rwlock_t would do this with a counter for the readers that it changes with atomic operations, as long as there is no contention with writers. 只要不与编写者争用, pthread_rwlock_t一个合适的实现就可以通过原子操作改变读者的计数器。 This should be very fast. 这应该非常快。 Once there is contention, it would be as costly as a mutex, I think. 一旦出现争用,我认为它会像互斥锁一样昂贵。

Your locks are probably too fine-grained. 你的锁可能太精细了。 Of course, the optimal granularity may vary depending on workload. 当然,最佳粒度可能因工作负载而异。

You could use a single lock for the whole tree, and it may perform better. 您可以对整个树使用单个锁,它可能表现更好。 But, if you do lots of reading and relatively few insertions/deletions, you end up with the whole tree locked often for no good reason. 但是,如果你进行了大量的阅读并且插入/删除的次数相对较少,那么你最终会无缘无故地锁定整棵树。 You may want to use a reader-writer lock, which would allow several readers at the same time. 您可能希望使用读写器锁,这将允许多个读者同时使用。

Your question reminded me of this other one , when there is a comparison between fine-grained locking and coarse-grained locking for a linked list. 当链接列表的细粒度锁定和粗粒度锁定之间进行比较时,您的问题提醒了我另一个问题。 While in the coarse-grained version each thread run in turn (not in parallel), and the total running time was slightly more than the sum of each thread's running time, and in the fine-grained version total running time was much less than the sum of each thread's running time, the added overhead of fine-grained locking totally offset these benefit, making the fine-grained version slower than the coarse-grained one. 在粗粒度版本中,每个线程依次运行(不是并行),并且总运行时间略大于每个线程运行时间的总和,并且在细粒度版本中总运行时间远小于每个线程运行时间的总和,细粒度锁定的额外开销完全抵消了这些好处,使得细粒度版本比粗粒度版本慢。

Locking and unlocking are very expensive operations in the case of pthread_mutex_lock/unlock. 在pthread_mutex_lock / unlock的情况下,锁定和解锁是非常昂贵的操作。 With more details on the algorithm I could make some suggestions but as far as I can tell I can't tell you anything for certain. 关于算法的更多细节,我可以提出一些建议,但据我所知,我无法肯定地告诉你任何事情。 Semaphores are an alternative (again depending on the algorithm) and also barriers are another useful method for concurrency. 信号量是另一种选择(同样取决于算法),并且障碍是另一种有用的并发方法。 To help the overhead you can do things like either make your locks smaller granularity or greater granularity. 为了帮助开销,您可以执行诸如使锁定更小粒度或更大粒度之类的操作。 locks inside loops that iterate multiple times are a bad idea and you may want to move them outside the loop. 锁定多次迭代的循环是一个坏主意,您可能希望将它们移出循环。 This is just one example but there are probably more I can come up with. 这只是一个例子,但我可以想出更多。 It's about determining whether the cost of the lock is greater than that of the critical section of your code. 它是关于确定锁的成本是否大于代码的关键部分的成本。 If you provide your algorithm or some sample code I'd be glad to take a look at it. 如果您提供算法或一些示例代码,我很乐意看一下。

pthread_mutex_lock and pthread_cond_wait are OS primitives - they put calling thread into sleep, transferring control to another thread. pthread_mutex_lock和pthread_cond_wait是OS原语 - 它们将调用线程置于休眠状态,将控制权转移到另一个线程。 Ie they involve syscalls and a lot of overhead. 即它们涉及系统调用和大量开销。 In tight integration between two threads, you don't really want to relinquish any control for even a cycle. 在两个线程之间的紧密集成中,即使是一个周期,你也不想放弃任何控制。

Give that I suggest using volatile int variables instead of mutexes: 我建议使用volatile int变量而不是互斥量:

volatile int data_ready = 0;
/*  ... */
while (!data_ready);
process_data();
data_ready = 0;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM