性能测试：sem_t vs dispatch_semaphore_t和pthread_once_t vs dispatch_once_t

Question

I wanted to know what would be better/faster to use POSIX calls like pthread_once() and sem_wait() or the dispatch_* functions, so I created a little test and am surprised at the results (questions and results are at the end). 我想知道使用POSIX调用（如pthread_once()和sem_wait()或dispatch_ *函数会更好/更快，所以我创建了一个小测试并对结果感到惊讶（问题和结果在最后）。

In the test code I am using mach_absolute_time() to time the calls. 在测试代码中，我使用mach_absolute_time（）来为调用计时。 I really don't care that this is not exactly matching up with nano-seconds; 我真的不在乎这与纳秒没有完全匹配; I am comparing the values with each other so the exact time units don't matter, only the differences between the interval do. 我正在将这些值相互比较，因此确切的时间单位无关紧要，只有间隔之间的差异。 The numbers in the results section are repeatable and not averaged; 结果部分中的数字是可重复的而不是平均数; I could have averaged the times but I am not looking for exact numbers. 我可以平均时间，但我不是在寻找确切的数字。

test.m (simple console application; easy to compile): test.m（简单的控制台应用程序;易于编译）：

#import <Foundation/Foundation.h>
#import <dispatch/dispatch.h>
#include <semaphore.h>
#include <pthread.h>
#include <time.h>
#include <mach/mach_time.h>  

// *sigh* OSX does not have pthread_barrier (you can ignore the pthread_barrier 
// code, the interesting stuff is lower)
typedef int pthread_barrierattr_t;
typedef struct
{
    pthread_mutex_t mutex;
    pthread_cond_t cond;
    int count;
    int tripCount;
} pthread_barrier_t;


int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count)
{
    if(count == 0)
    {
        errno = EINVAL;
        return -1;
    }
    if(pthread_mutex_init(&barrier->mutex, 0) < 0)
    {
        return -1;
    }
    if(pthread_cond_init(&barrier->cond, 0) < 0)
    {
        pthread_mutex_destroy(&barrier->mutex);
        return -1;
    }
    barrier->tripCount = count;
    barrier->count = 0;

    return 0;
}

int pthread_barrier_destroy(pthread_barrier_t *barrier)
{
    pthread_cond_destroy(&barrier->cond);
    pthread_mutex_destroy(&barrier->mutex);
    return 0;
}

int pthread_barrier_wait(pthread_barrier_t *barrier)
{
    pthread_mutex_lock(&barrier->mutex);
    ++(barrier->count);
    if(barrier->count >= barrier->tripCount)
    {
        barrier->count = 0;
        pthread_cond_broadcast(&barrier->cond);
        pthread_mutex_unlock(&barrier->mutex);
        return 1;
    }
    else
    {
        pthread_cond_wait(&barrier->cond, &(barrier->mutex));
        pthread_mutex_unlock(&barrier->mutex);
        return 0;
    }
}

//
// ok you can start paying attention now...
//

void onceFunction(void)
{
}

@interface SemaphoreTester : NSObject
{
    sem_t *sem1;
    sem_t *sem2;
    pthread_barrier_t *startBarrier;
    pthread_barrier_t *finishBarrier;
}
@property (nonatomic, assign) sem_t *sem1;
@property (nonatomic, assign) sem_t *sem2;
@property (nonatomic, assign) pthread_barrier_t *startBarrier;
@property (nonatomic, assign) pthread_barrier_t *finishBarrier;
@end
@implementation SemaphoreTester
@synthesize sem1, sem2, startBarrier, finishBarrier;
- (void)thread1
{
    pthread_barrier_wait(startBarrier);
    for(int i = 0; i < 100000; i++)
    {
        sem_wait(sem1);
        sem_post(sem2);
    }
    pthread_barrier_wait(finishBarrier);
}

- (void)thread2
{
    pthread_barrier_wait(startBarrier);
    for(int i = 0; i < 100000; i++)
    {
        sem_wait(sem2);
        sem_post(sem1);
    }
    pthread_barrier_wait(finishBarrier);
}
@end


int main (int argc, const char * argv[]) 
{
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    int64_t start;
    int64_t stop;

    // semaphore non contention test
    {
        // grrr, OSX doesn't have sem_init
        sem_t *sem1 = sem_open("sem1", O_CREAT, 0777, 0);

        start = mach_absolute_time();
        for(int i = 0; i < 100000; i++)
        {
            sem_post(sem1);
            sem_wait(sem1);
        }
        stop = mach_absolute_time();
        sem_close(sem1);

        NSLog(@"0 Contention time                         = %d", stop - start);
    }

    // semaphore contention test
    {
        __block sem_t *sem1 = sem_open("sem1", O_CREAT, 0777, 0);
        __block sem_t *sem2 = sem_open("sem2", O_CREAT, 0777, 0);
        __block pthread_barrier_t startBarrier;
        pthread_barrier_init(&startBarrier, NULL, 3);
        __block pthread_barrier_t finishBarrier;
        pthread_barrier_init(&finishBarrier, NULL, 3);

        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0);
        dispatch_async(queue, ^{
            pthread_barrier_wait(&startBarrier);
            for(int i = 0; i < 100000; i++)
            {
                sem_wait(sem1);
                sem_post(sem2);
            }
            pthread_barrier_wait(&finishBarrier);
        });
        dispatch_async(queue, ^{
            pthread_barrier_wait(&startBarrier);
            for(int i = 0; i < 100000; i++)
            {
                sem_wait(sem2);
                sem_post(sem1);
            }
            pthread_barrier_wait(&finishBarrier);
        });
        pthread_barrier_wait(&startBarrier);
        // start timing, everyone hit this point
        start = mach_absolute_time();
        // kick it off
        sem_post(sem2);
        pthread_barrier_wait(&finishBarrier);
        // stop timing, everyone hit the finish point
        stop = mach_absolute_time();
        sem_close(sem1);
        sem_close(sem2);
        NSLog(@"2 Threads always contenting time          = %d", stop - start);
        pthread_barrier_destroy(&startBarrier);
        pthread_barrier_destroy(&finishBarrier);
    }   

    // NSTask semaphore contention test
    {
        sem_t *sem1 = sem_open("sem1", O_CREAT, 0777, 0);
        sem_t *sem2 = sem_open("sem2", O_CREAT, 0777, 0);
        pthread_barrier_t startBarrier;
        pthread_barrier_init(&startBarrier, NULL, 3);
        pthread_barrier_t finishBarrier;
        pthread_barrier_init(&finishBarrier, NULL, 3);

        SemaphoreTester *tester = [[[SemaphoreTester alloc] init] autorelease];
        tester.sem1 = sem1;
        tester.sem2 = sem2;
        tester.startBarrier = &startBarrier;
        tester.finishBarrier = &finishBarrier;
        [NSThread detachNewThreadSelector:@selector(thread1) toTarget:tester withObject:nil];
        [NSThread detachNewThreadSelector:@selector(thread2) toTarget:tester withObject:nil];
        pthread_barrier_wait(&startBarrier);
        // start timing, everyone hit this point
        start = mach_absolute_time();
        // kick it off
        sem_post(sem2);
        pthread_barrier_wait(&finishBarrier);
        // stop timing, everyone hit the finish point
        stop = mach_absolute_time();
        sem_close(sem1);
        sem_close(sem2);
        NSLog(@"2 NSTasks always contenting time          = %d", stop - start);
        pthread_barrier_destroy(&startBarrier);
        pthread_barrier_destroy(&finishBarrier);
    }   

    // dispatch_semaphore non contention test
    {
        dispatch_semaphore_t sem1 = dispatch_semaphore_create(0);

        start = mach_absolute_time();
        for(int i = 0; i < 100000; i++)
        {
            dispatch_semaphore_signal(sem1);
            dispatch_semaphore_wait(sem1, DISPATCH_TIME_FOREVER);
        }
        stop = mach_absolute_time();

        NSLog(@"Dispatch 0 Contention time                = %d", stop - start);
    }


    // dispatch_semaphore non contention test
    {   
        __block dispatch_semaphore_t sem1 = dispatch_semaphore_create(0);
        __block dispatch_semaphore_t sem2 = dispatch_semaphore_create(0);
        __block pthread_barrier_t startBarrier;
        pthread_barrier_init(&startBarrier, NULL, 3);
        __block pthread_barrier_t finishBarrier;
        pthread_barrier_init(&finishBarrier, NULL, 3);

        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0);
        dispatch_async(queue, ^{
            pthread_barrier_wait(&startBarrier);
            for(int i = 0; i < 100000; i++)
            {
                dispatch_semaphore_wait(sem1, DISPATCH_TIME_FOREVER);
                dispatch_semaphore_signal(sem2);
            }
            pthread_barrier_wait(&finishBarrier);
        });
        dispatch_async(queue, ^{
            pthread_barrier_wait(&startBarrier);
            for(int i = 0; i < 100000; i++)
            {
                dispatch_semaphore_wait(sem2, DISPATCH_TIME_FOREVER);
                dispatch_semaphore_signal(sem1);
            }
            pthread_barrier_wait(&finishBarrier);
        });
        pthread_barrier_wait(&startBarrier);
        // start timing, everyone hit this point
        start = mach_absolute_time();
        // kick it off
        dispatch_semaphore_signal(sem2);
        pthread_barrier_wait(&finishBarrier);
        // stop timing, everyone hit the finish point
        stop = mach_absolute_time();

        NSLog(@"Dispatch 2 Threads always contenting time = %d", stop - start);
        pthread_barrier_destroy(&startBarrier);
        pthread_barrier_destroy(&finishBarrier);
    }   

    // pthread_once time
    {
        pthread_once_t once = PTHREAD_ONCE_INIT;
        start = mach_absolute_time();
        for(int i = 0; i <100000; i++)
        {
            pthread_once(&once, onceFunction);
        }
        stop = mach_absolute_time();

        NSLog(@"pthread_once time  = %d", stop - start);
    }

    // dispatch_once time
    {
        dispatch_once_t once = 0;
        start = mach_absolute_time();
        for(int i = 0; i <100000; i++)
        {
            dispatch_once(&once, ^{});
        }
        stop = mach_absolute_time();

        NSLog(@"dispatch_once time = %d", stop - start);
    }

    [pool drain];
    return 0;
}

On My iMac (Snow Leopard Server 10.6.4): 在我的iMac（Snow Leopard Server 10.6.4）上：

Model Identifier: iMac7,1
  Processor Name:   Intel Core 2 Duo
  Processor Speed:  2.4 GHz
  Number Of Processors: 1
  Total Number Of Cores:    2
  L2 Cache: 4 MB
  Memory:   4 GB
  Bus Speed:    800 MHz

I get: 我明白了：

0 Contention time                         =    101410439
2 Threads always contenting time          =    109748686
2 NSTasks always contenting time          =    113225207
0 Contention named semaphore time         =    166061832
2 Threads named semaphore contention time =    203913476
2 NSTasks named semaphore contention time =    204988744
Dispatch 0 Contention time                =      3411439
Dispatch 2 Threads always contenting time =    708073977
pthread_once time  =      2707770
dispatch_once time =        87433

On my MacbookPro (Snow Leopard 10.6.4): 在我的MacbookPro（Snow Leopard 10.6.4）上：

Model Identifier: MacBookPro6,2
  Processor Name:   Intel Core i5
  Processor Speed:  2.4 GHz
  Number Of Processors: 1
  Total Number Of Cores:    2 (though HT is enabled)
  L2 Cache (per core):  256 KB
  L3 Cache: 3 MB
  Memory:   8 GB
  Processor Interconnect Speed: 4.8 GT/s

I got: 我有：

0 Contention time                         =     74172042
2 Threads always contenting time          =     82975742
2 NSTasks always contenting time          =     82996716
0 Contention named semaphore time         =    106772641
2 Threads named semaphore contention time =    162761973
2 NSTasks named semaphore contention time =    162919844
Dispatch 0 Contention time                =      1634941
Dispatch 2 Threads always contenting time =    759753865
pthread_once time  =      1516787
dispatch_once time =       120778

on an iPhone 3GS 4.0.2 I got: 在iPhone 3GS 4.0.2上，我得到了：

0 Contention time                         =      5971929
2 Threads always contenting time          =     11989710
2 NSTasks always contenting time          =     11950564
0 Contention named semaphore time         =     16721876
2 Threads named semaphore contention time =     35333045
2 NSTasks named semaphore contention time =     35296579
Dispatch 0 Contention time                =       151909
Dispatch 2 Threads always contenting time =     46946548
pthread_once time  =       193592
dispatch_once time =        25071

Questions and statements: 问题和陈述：

sem_wait() and sem_post() are slow when not under contention sem_wait()和sem_post()在没有争用时很慢
- why is this the case? 这是为什么？
- does OSX not care about compatible APIs? OSX不关心兼容的API吗？ is there some legacy code that forces this to be slow? 是否有一些遗留代码会迫使这个代码变慢？
- Why aren't these numbers the same as the dispatch_semaphore functions? 为什么这些数字与dispatch_semaphore函数不同？
sem_wait() and sem_post() are just as slow when under contention as when they are not (there is a difference but I thought that it would be a huge difference between under contention and not; I expected numbers like what was in the dispatch_semaphore code) sem_wait()和sem_post()在争用时和它们不同时一样慢（存在差异，但我认为在争用之间存在巨大差异而不是;我期望数字类似于dispatch_semaphore代码中的数字）
sem_wait() and sem_post() are slower when using named semaphores. 使用命名信号量时， sem_wait()和sem_post()会变慢。
- Why? 为什么？ is this because the semaphore has to be synced between processes? 这是因为信号量必须在进程之间同步吗？ maybe there is more baggage when doing that. 也许这样做会有更多的包袱。
dispatch_semaphore_wait() and dispatch_semaphore_signal() are crazy fast when not under contention (no surprise here since apple is touting this a lot). dispatch_semaphore_wait()和dispatch_semaphore_signal()在没有争用的情况下很快就疯狂了（因为苹果公司大肆宣传这一点并不令人意外）。
dispatch_semaphore_wait() and dispatch_semaphore_signal() are 3x slower than sem_wait() and sem_post() when under contention dispatch_semaphore_wait()和dispatch_semaphore_signal()在争用时比sem_wait()和sem_post()慢3倍
- Why is this so slow? 为什么这么慢？ this does not make sense to me. 这对我没有意义。 I would have expected this to be on par with the sem_t under contention. 我本来期望这与争用中的sem_t相提并论。
dispatch_once() is faster than pthread_once() , around 10x, why? dispatch_once()比pthread_once()更快，大约10倍，为什么？ The only thing I can tell from the headers is that there is no function call burden with dispatch_once() than with pthread_once() . 我从标题中唯一可以看出， dispatch_once()与pthread_once()相比没有函数调用负担。

Motivation: I am presented with 2 sets of tools to get the job done for semaphores or once calls (I actually found other semaphore variants in the meantime, but I will ignore those unless brought up as a better option). 动机：我提供了两套工具来完成信号量或一次调用的工作（在此期间我实际上找到了其他信号量变量，但除非提出作为更好的选择，否则我将忽略它们）。 I just want to know what is the best tool for the job (If you have the option for screwing in a screw with a philips or flathead, I would choose philips if I don't have to torque the screw and flathead if I have to torque the screw). 我只是想知道这项工作的最佳工具是什么（如果您可以选择拧入带有飞利浦或平头的螺丝，如果我不需要拧紧螺丝和扁平头，我会选择飞利浦拧紧螺钉）。 It seems that if I start writing utilities with libdispatch I might not be able to port them to other operating systems that do not have libdispatch working yet... but it is so enticing to use ;) 似乎如果我开始使用libdispatch编写实用程序，我可能无法将它们移植到其他没有libdispatch工作的操作系统......但它是如此诱人使用;）

As it stands: I will be using libdispatch when I don't have to worry about portability and POSIX calls when I do. 目前的情况：当我不必担心可移植性和POSIX调用时，我将使用libdispatch。

Thanks! 谢谢！

Answer 1

sem_wait() and sem_post() are heavy weight synchronization facilities that can be used between processes. sem_wait（）和sem_post（）是可以在进程之间使用的重量级同步工具。 They always involve round trips to the kernel, and probably always require your thread to be rescheduled. 它们总是涉及到内核的往返，并且可能总是需要重新安排您的线程。 They are generally not the right choice for in-process synchronization. 它们通常不是进程内同步的正确选择。 I'm not sure why the named variants would be slower than the anonymous ones... 我不确定为什么命名变体会比匿名变种慢...

Mac OS X is actually pretty good about Posix compatibility... But the Posix specifications have a lot of optional functions, and the Mac doesn't have them all. Mac OS X实际上与Posix兼容性相当不错......但是Posix规范有很多可选功能，Mac并没有全部功能。 Your post is actually the first I've ever heard of pthread_barriers, so I'm guessing they're either relatively recent, or not all that common. 你的帖子实际上是我第一次听说过pthread_barriers，所以我猜它们要么是相对较新的，要么不是那么常见。 (I haven't paid much attention to pthreads evolution for the past ten years or so.) （过去十年左右，我对pthreads的演变并没有太多关注。）

The reason the dispatch stuff falls apart under forced extreme contention is probably because under the covers the behavior is similar to spin locks. 发送内容在强制极端争用下崩溃的原因可能是因为在封面下行为类似于自旋锁。 Your dispatch worker threads are very likely wasting a good chunk of their quanta under the optimistic assumption that the resource under contention is going to be available any cycle now... A bit of time with Shark would tell you for sure. 您的调度工作者线程很可能在乐观的假设下浪费了大量的量子，即争用资源现在可以在任何周期使用...... Shark的一些时间会告诉您。 The take-home point, though, should be that "optimizing" the thrashing during contention is a poor investment of programmer time. 然而，回归点应该是“优化”争用期间的颠簸是程序员时间的不良投资。 Instead spend the time optimizing the code to avoid heavy contention in the first place. 而是花时间优化代码以避免首先出现严重争用。

If you really have a resource that is an un-avoidable bottleneck within your process, putting a semaphore around it is massively sub-optimal. 如果你真的有一个资源是你的过程中不可避免的瓶颈，那么在它周围放置一个信号量是非常不理想的。 Put it on its own serial dispatch queue, and as much as possible dispatch_async blocks to be executed on that queue. 将它放在自己的串行调度队列上，尽可能在该队列上执行dispatch_async块。

Finally, dispatch_once() is faster than pthread_once() because it's spec'd and implemented to be fast on current processors. 最后，dispatch_once（）比pthread_once（）更快，因为它规范并实现为在当前处理器上快速。 Probably Apple could speed up the pthread_once() implementation, as I suspect the reference implementation uses pthread synchronization primitives, but... well... they've provided all of the libdispatch goodness instead. 可能Apple可以加速pthread_once（）实现，因为我怀疑参考实现使用pthread同步原语，但是......好吧......他们已经提供了所有的libdispatch优点。 :-) :-)

性能测试：sem_t vs dispatch_semaphore_t和pthread_once_t vs dispatch_once_t

问题描述

1 个解决方案

解决方案1
11 已采纳 2010-09-05 18:07:19

性能测试：sem_t vs dispatch_semaphore_t和pthread_once_t vs dispatch_once_t

问题描述

1 个解决方案

解决方案1 11 已采纳 2010-09-05 18:07:19

解决方案1
11 已采纳 2010-09-05 18:07:19