简体   繁体   English

WaitForSingleObject与互锁*

[英]WaitForSingleObject vs Interlocked*

Under WinAPI there is WaitForSingleObject() and ReleaseMutex() function pair. 在WinAPI下有WaitForSingleObject()和ReleaseMutex()函数对。 Also there is Interlocked*() function family. 还有Interlocked *()函数族。 I decided to check out performance between capturing single mutex and exchanging interlocked variable. 我决定检查一下捕获单个互斥锁和交换互锁变量之间的性能。

HANDLE mutex;
WaitForSingleObject(mutex, INFINITE);
// ..
ReleaseMutex(mutex);

// 0 unlocked, 1 locked
LONG lock = 0;
while(InterlockedCompareExchange(&lock, 1, 0))
  SwitchToThread();
// ..
InterlockedExchange(&lock, 0);
SwitchToThread();

I've measured performance between these two methods and found out that using Interlocked*() is about 38% faster. 我测量了这两种方法之间的性能,发现使用Interlocked *()的速度大约快38%。 Why is it so? 为什么会这样呢?

Here's my performance test: 这是我的性能测试:

#include <windows.h>
#include <iostream>
#include <conio.h>
using namespace std;

LONG interlocked_variable   = 0; // 0 unlocked, 1 locked
int run                     = 1;

DWORD WINAPI thread(LPVOID lpParam)
{
    while(run)
    {
        while(InterlockedCompareExchange(&interlocked_variable, 1, 0))
            SwitchToThread();
        ++(*((unsigned int*)lpParam));
        InterlockedExchange(&interlocked_variable, 0);
        SwitchToThread();
    }

    return 0;
}

int main()
{
    unsigned int num_threads;
    cout << "number of threads: ";
    cin >> num_threads;
    unsigned int* num_cycles = new unsigned int[num_threads];
    DWORD s_time, e_time;

    s_time = GetTickCount();
    for(unsigned int i = 0; i < num_threads; ++i)
    {
        num_cycles[i] = 0;
        HANDLE handle = CreateThread(NULL, NULL, thread, &num_cycles[i], NULL, NULL);
        CloseHandle(handle);
    }
    _getch();
    run = 0;
    e_time = GetTickCount();

    unsigned long long total = 0;
    for(unsigned int i = 0; i < num_threads; ++i)
        total += num_cycles[i];
    for(unsigned int i = 0; i < num_threads; ++i)
        cout << "\nthread " << i << ":\t" << num_cycles[i] << " cyc\t" << ((double)num_cycles[i] / (double)total) * 100 << "%";
    cout << "\n----------------\n"
        << "cycles total:\t" << total
        << "\ntime elapsed:\t" << e_time - s_time << " ms"
        << "\n----------------"
        << '\n' << (double)(e_time - s_time) / (double)(total) << " ms\\op\n";

    delete[] num_cycles;
    _getch();
    return 0;
}

WaitForSingleObject does not have to be faster. WaitForSingleObject不必更快。 It covers a much wider scope of synchronization scenarios, in particular you can wait on handles which do not "belong" to your process and hence interprocess synchronization. 它涵盖了范围更广的同步方案,尤其是您可以等待不“属于”您的进程的句柄,从而避免进程间同步。 Taking all this into consideration it is only 38% slower according to your test. 考虑到所有这些因素,根据您的测试,它慢38%。

If you have everything inside your process and every nanosecond counts, InterlockedXxx might be a better option, but it's definitely not absolutely superior one. 如果您拥有过程中的所有内容并且每一分之一秒都很重要,则InterlockedXxx可能是一个更好的选择,但绝对不是绝对优越的选择。

Additionally, you might want to look at Slim Reader/Writer (SRW) Locks API. 此外,您可能希望查看Slim读/写(SRW)锁 API。 You will perhaps be able to build a similar class/functions based purely on InterlockedXxx with slightly better performance, however the point is that with SRW you get it ready to use out of the box, with documented behavior, stable and with decent performance anyway. 您也许可以完全基于InterlockedXxx构建类似的类/函数,并且性能稍好,但是,要点是,借助SRW,您可以立即使用它,并具有成文记录的行为,稳定且性能不错。

You are not comparing equivalent locks so it's not surprising that the performance is so different. 您没有在比较等效锁,因此性能如此不同也就不足为奇了。

A mutex allows for cross process locking, it's likely one of the most expensive ways to lock due to the flexibility that it provides. 互斥锁允许跨进程锁定,由于它提供的灵活性,它可能是最昂贵的锁定方法之一。 It will usually put your thread to sleep when you block on a lock and this uses no cpu until you are woken up having gained the lock. 当您阻塞一个锁时,它通常会使您的线程进入睡眠状态,并且直到您被锁醒后才使用cpu。 This allows other code to use the cpu. 这允许其他代码使用cpu。

Your InterlockedCompareExchange() code is a simple spin lock. 您的InterlockedCompareExchange()代码是一个简单的自旋锁。 You will burn CPU waiting for your lock. 您将消耗CPU等待锁定。

You might also want to look into Critical Sections (less overhead than a Mutex) and Slim Reader/Writer Locks (which can be used for mutual exclusion if you always obtain an exclusive lock and which provide fractionally faster performance than critical sections for non-contested use, according to my tests). 您可能还想研究关键部分 (比Mutex少的开销)和Slim读/写器锁 (如果您始终获得独占锁,可以用于互斥),并且对于非竞争性的设备,其性能要比关键部分快一些根据我的测试使用)。

You might also want to read Kenny Kerr's "The Evolution of Synchronization in Windows and C++" and Preshing's lock related stuff, here and here . 您可能还想 这里这里阅读Kenny Kerr的“ Windows和C ++中同步的发展”和Preshing的锁相关知识。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM