简体   繁体   English

工作线程队列的最轻同步原语

[英]Lightest synchronization primitive for worker thread queue

I am about to implement a worker thread with work item queuing, and while I was thinking about the problem, I wanted to know if I'm doing the best thing. 我即将实现一个带有工作项排队的工作线程,当我在思考这个问题时,我想知道我是否做得最好。

The thread in question will have to have some thread local data (preinitialized at construction) and will loop on work items until some condition will be met. 有问题的线程必须有一些线程本地数据(在构造时初始化)并将循环工作项,直到满足某些条件。

pseudocode: 伪代码:

volatile bool run = true;

int WorkerThread(param)
{
    localclassinstance c1 = new c1();
    [other initialization]

    while(true) {
        [LOCK]
        [unqueue work item]
        [UNLOCK]
        if([hasWorkItem]) {
            [process data]
            [PostMessage with pointer to data]
        }
        [Sleep]

        if(!run)
            break;
    }

    [uninitialize]
    return 0;
}

I guess I will do the locking via critical section, as the queue will be std::vector or std::queue, but maybe there is a better way. 我想我会通过关键部分进行锁定,因为队列将是std :: vector或std :: queue,但也许有更好的方法。

The part with Sleep doesn't look too great, as there will be a lot of extra Sleep with big Sleep values, or lot's of extra locking when Sleep value is small, and that's definitely unnecessary. 具有睡眠功能的部分看起来并不太好,因为睡眠值很大时会有很多额外的睡眠,或者睡眠值很小时需要额外锁定,这绝对没有必要。

But I can't think of a WaitForSingleObject friendly primitive I could use instead of critical section, as there might be two threads queuing work items at the same time. 但我想不出我可以使用的WaitForSingleObject友好原语而不是临界区,因为可能有两个线程同时排队工作项。 So Event, which seems to be the best candidate, can loose the second work item if the Event was set already, and it doesn't guarantee a mutual exclusion. 因此事件似乎是最佳候选者,如果事件已经设置,则可以松开第二个工作项,并且它不能保证互斥。

Maybe there is even a better approach with InterlockedExchange kind of functions that leads to even less serialization. 也许甚至有更好的方法使用InterlockedExchange类型的函数,这导致更少的序列化。

PS: I might need to preprocess the whole queue and drop the obsolete work items during the unqueuing stage. PS:我可能需要预先处理整个队列并在出队前阶段删除过时的工作项。

There are a multitude of ways to do this. 有很多方法可以做到这一点。

One option is to use a semaphore for the waiting. 一种选择是使用信号量进行等待。 The semaphore is signalled every time a value is pushed on the queue, so the worker thread will only block if there are no items in the queue. 每次在队列上按下值时都会发信号通知信号量,因此只有队列中没有项目时,工作线程才会阻塞。 This will still require separate synchronization on the queue itself. 这仍然需要队列本身的单独同步。

A second option is to use a manual-reset event which is set when there are items in the queue and cleared when the queue is empty. 第二种选择是使用手动重置事件,该事件在队列中有项目时设置,在队列为空时清除。 Again, you will need to do separate synchronization on the queue. 同样,您需要在队列上进行单独的同步。

A third option is to have an invisible message-only window created on the thread, and use a special WM_USER or WM_APP message to post items to the queue, attaching the item to the message via a pointer. 第三种选择是在线程上创建一个不可见的仅消息窗口,并使用特殊的WM_USERWM_APP消息将项目发布到队列,通过指针将项目附加到消息。

Another option is to use condition variables . 另一种选择是使用条件变量 The native Windows condition variables only work if you're targetting Windows Vista or Windows 7, but condition variables are also available for Windows XP with Boost or an implementation of the C++0x thread library. 本机Windows条件变量仅在您使用Windows Vista或Windows 7时才有效,但条件变量也可用于带有Boost的Windows XP或C ++ 0x线程库的实现。 An example queue using boost condition variables is available on my blog: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html 我的博客上提供了使用boost条件变量的示例队列: http//www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

It is possible to share a resource between threads without using blocking locks at all, if your scenario meets certain requirements. 如果您的方案满足特定要求,则可以在不使用阻塞锁的情况下在线程之间共享资源。

You need an atomic pointer exchange primitive, such as Win32's InterlockedExchange . 您需要一个原子指针交换原语,例如Win32的InterlockedExchange Most processor architectures provide some sort of atomic swap, and it's usually much less expensive than acquiring a formal lock. 大多数处理器体系结构提供某种原子交换,并且通常比获取正式锁定要便宜得多。

You can store your queue of work items in a pointer variable that is accessible to all the threads that will be interested in it. 您可以将工作项队列存储在一个指针变量中,该变量可供所有对其感兴趣的线程访问。 (global var, or field of an object that all the threads have access to) (全局var,或所有线程有权访问的对象的字段)

This scenario assumes that the threads involved always have something to do, and only occasionally "glance" at the shared resource. 此方案假定所涉及的线程始终有事可做,并且只是偶尔“浏览”共享资源。 If you want a design where threads block waiting for input, use a traditional blocking event object. 如果您想要线程阻塞等待输入的设计,请使用传统的阻塞事件对象。

Before anything begins, create your queue or work item list object and assign it to the shared pointer variable. 在任何事情开始之前,创建您的队列或工作项列表对象并将其分配给共享指针变量。

Now, when producers want to push something onto the queue, they "acquire" exclusive access to the queue object by swapping a null into the shared pointer variable using InterlockedExchange. 现在,当生产者想要将某些内容推送到队列时,他们通过使用InterlockedExchange将null交换到共享指针变量来“获取”对队列对象的独占访问权。 If the result of the swap returns a null, then somebody else is currently modifying the queue object. 如果交换的结果返回null,则其他人当前正在修改队列对象。 Sleep(0) to release the rest of your thread's time slice, then loop to retry the swap until it returns non-null. Sleep(0)释放线程的其余时间片,然后循环重试交换,直到它返回非null。 Even if you end up looping a few times, this is many. 即使你最终循环几次,这也很多。 many times faster than making a kernel call to acquire a mutex object. 比内核调用获取互斥对象要快许多倍。 Kernel calls require hundreds of clock cycles to transition into kernel mode. 内核调用需要数百个时钟周期才能转换为内核模式。

When you successfully obtain the pointer, make your modifications to the queue, then swap the queue pointer back into the shared pointer. 成功获取指针后,对队列进行修改,然后将队列指针交换回共享指针。

When consuming items from the queue, you do the same thing: swap a null into the shared pointer and loop until you get a non-null result, operate on the object in the local var, then swap it back into the shared pointer var. 当从队列中使用项时,您也会做同样的事情:将null交换到共享指针并循环,直到获得非null结果,对本地var中的对象进行操作,然后将其交换回共享指针var。

This technique is a combination of atomic swap and brief spin loops. 这种技术是原子交换和简短旋转循环的组合。 It works well in scenarios where the threads involved are not blocked and collisions are rare. 它适用于所涉及的线程未被阻塞且冲突很少的情况。 Most of the time the swap will give you exclusive access to the shared object on the first try, and as long as the length of time the queue object is held exclusively by any thread is very short then no thread should have to loop more than a few times before the queue object becomes available again. 大多数情况下,交换将在第一次尝试时为您提供对共享对象的独占访问权限,并且只要队列对象由任何线程独占持有的时间长度非常短,那么没有线程应该循环多于一个在队列对象再次可用之前几次。

If you expect a lot of contention between threads in your scenario, or you want a design where threads spend most of their time blocked waiting for work to arrive, you may be better served by a formal mutex synchronization object. 如果您希望场景中的线程之间存在大量争用,或者您希望线程花费大部分时间的设计阻止等待工作到达,那么正式的互斥同步对象可能会更好地为您提供服务。

The fastest locking primitive is usually a spin-lock or spin-sleep-lock. 最快的锁定原语通常是自旋锁或自旋睡眠锁。 CRITICAL_SECTION is just such a (user-space) spin-sleep-lock. CRITICAL_SECTION就是这样一个(用户空间)spin-sleep-lock。 (Well, aside from not using locking primitives at all of course. But that means using lock-free data-structures, and those are really really hard to get right.) (好吧,除了当然没有使用锁定原语。但这意味着使用无锁数据结构,这些真的很难做对。)

As for avoiding the Sleep: have a look at condition-variables. 至于避免睡眠:看一下条件变量。 They're designed to be used together with a "mutex", and I think they're much easier to use correctly than Windows' EVENTs. 它们被设计为与“互斥体”一起使用,我认为它们比Windows的EVENT更容易正确使用。

Boost.Thread has a nice portable implementation of both, fast user-space spin-sleep-locks and condition variables: Boost.Thread有一个很好的可移植实现,快速用户空间旋转睡眠锁和条件变量:

http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref

A work-queue using Boost.Thread could look something like this: 使用Boost.Thread的工作队列看起来像这样:

template <class T>
class Queue : private boost::noncopyable
{
public:
    void Enqueue(T const& t)
    {
        unique_lock lock(m_mutex);

        // wait until the queue is not full
        while (m_backingStore.size() >= m_maxSize)
            m_queueNotFullCondition.wait(lock); // releases the lock temporarily

        m_backingStore.push_back(t);
        m_queueNotEmptyCondition.notify_all(); // notify waiters that the queue is not empty
    }

    T DequeueOrBlock()
    {
        unique_lock lock(m_mutex);

        // wait until the queue is not empty
        while (m_backingStore.empty())
            m_queueNotEmptyCondition.wait(lock); // releases the lock temporarily

        T t = m_backingStore.front();
        m_backingStore.pop_front();

        m_queueNotFullCondition.notify_all(); // notify waiters that the queue is not full

        return t;
    }

private:
    typedef boost::recursive_mutex mutex;
    typedef boost::unique_lock<boost::recursive_mutex> unique_lock;

    size_t const m_maxSize;

    mutex mutable m_mutex;
    boost::condition_variable_any m_queueNotEmptyCondition;
    boost::condition_variable_any m_queueNotFullCondition;

    std::deque<T> m_backingStore;
};

There are various ways to do this 有多种方法可以做到这一点

For one you could create an event instead called 'run' and then use that to detect when thread should terminate, the main thread then signals. 例如,你可以创建一个名为'run'的事件,然后使用它来检测线程应该何时终止,然后主线程发出信号。 Instead of sleep you would then use WaitForSingleObject with a timeout, that way you will quit directly instead of waiting for sleep ms. 而不是睡眠,然后你会使用WaitForSingleObject超时,这样你就可以直接退出而不是等待睡眠时间。

Another way is to accept messages in your loop and then invent a user defined message that you post to the thread 另一种方法是接受循环中的消息,然后发明您发布到线程的用户定义消息

EDIT: depending on situation it may also be wise to have yet another thread that monitors this thread to check if it is dead or not, this can be done by the above mentioned message queue so replying to a certain message within x ms would mean that the thread hasn't locked up. 编辑:根据情况,有一个监视此线程的另一个线程检查它是否已经死也可能是明智的,这可以通过上面提到的消息队列来完成,因此在x ms内回复某个消息意味着线程没有锁定。

I'd restructure a bit: 我重组了一下:

WorkItem GetWorkItem()
{
    while(true)
    {
        WaitForSingleObject(queue.Ready);
        {
            ScopeLock lock(queue.Lock);
            if(!queue.IsEmpty())
            {
                return queue.GetItem();
            }
        }
    }
}

int WorkerThread(param) 
{ 
    bool done = false;
    do
    {
        WorkItem work  = GetWorkItem();
        if( work.IsQuitMessage() )
        {
            done = true;
        }
        else
        {
            work.Process();
        }
    } while(!done);

    return 0; 
} 

Points of interest: 兴趣点:

  1. ScopeLock is a RAII class to make critical section usage safer. ScopeLock是一个RAII类,可以使关键部分的使用更加安全。
  2. Block on event until workitem is (possibly) ready - then lock while trying to dequeue it. 阻止事件直到工作项(可能)准备就绪 - 然后尝试将其出列时锁定。
  3. don't use a global "IsDone" flag, enqueue special quitmessage WorkItem s. 不要使用全局“IsDone”标志,将特殊的quitmessage WorkItem排入队列。

You can have a look at another approach here that uses C++0x atomic operations 你可以看看这里使用C ++ 0x原子操作的另一种方法

http://www.drdobbs.com/high-performance-computing/210604448 http://www.drdobbs.com/high-performance-computing/210604448

使用信号量而不是事件。

Keep the signaling and synchronizing separate. 保持信号和同步分开。 Something along these lines... 沿着这些方向......

// in main thread

HANDLE events[2];
events[0] = CreateEvent(...); // for shutdown
events[1] = CreateEvent(...); // for work to do

// start thread and pass the events

// in worker thread

DWORD ret;
while (true)
{
   ret = WaitForMultipleObjects(2, events, FALSE, <timeout val or INFINITE>);

   if shutdown
      return
   else if do-work
      enter crit sec
      unqueue work
      leave crit sec
      etc.
   else if timeout
      do something else that has to be done
}

Given that this question is tagged windows, Ill answer thus: 鉴于这个问题被标记为窗口,我会回答:

Don't create 1 worker thread. 不要创建1个工作线程。 Your worker thread jobs are presumably independent, so you can process multiple jobs at once? 您的工作线程作业可能是独立的,因此您可以一次处理多个作业? If so: 如果是这样:

  • In your main thread call CreateIOCompletionPort to create an io completion port object. 在主线程中调用CreateIOCompletionPort来创建一个io完成端口对象。
  • Create a pool of worker threads. 创建一个工作线程池。 The number you need to create depends on how many jobs you might want to service in parallel. 您需要创建的数量取决于您可能希望并行服务的作业数量。 Some multiple of the number of CPU cores is a good start. CPU核心数量的一些是一个良好的开端。
  • Each time a job comes in call PostQueuedCompletionStatus() passing a pointer to the job struct as the lpOverlapped struct. 每次调用作业时,PostQueuedCompletionStatus()都会将指针作为lpOverlapped结构传递给作业结构。
  • Each worker thread calls GetQueuedCompletionItem() - retrieves the work item from the lpOverlapped pointer and does the job before returning to GetQueuedCompletionStatus. 每个工作线程调用GetQueuedCompletionItem() - 从lpOverlapped指针检索工作项并在返回GetQueuedCompletionStatus之前完成工作。

This looks heavy, but io completion ports are implemented in kernel mode and represent a queue that can be deserialized into any of the worker threads associated with the queue (ie waiting on a call to GetQueuedCompletionStatus). 这看起来很重,但是io完成端口是在内核模式下实现的,并且表示可以反序列化到与队列关联的任何工作线程的队列(即等待对GetQueuedCompletionStatus的调用)。 The io completion port knows how many of the threads that are processing an item are actually using a CPU vs blocked on an IO call - and will release more worker threads from the pool to ensure that the concurrency count is met. io完成端口知道处理项目的线程中有多少实际上在IO调用中使用CPU与阻塞 - 并且将从池中释放更多工作线程以确保满足并发计数。

So, its not lightweight, but it is very very efficient... io completion port can be associated with pipe and socket handles for example and can dequeue the results of asynchronous operations on those handles. 因此,它不是轻量级的,但它非常高效...例如,完成端口可以与管道和套接字句柄相关联,并且可以使这些句柄上的异步操作的结果出列。 io completion port designs can scale to handling 10's of thousands of socket connects on a single server - but on the desktop side of the world make a very convenient way of scaling processing of jobs over the 2 or 4 cores now common in desktop PCs. io完成端口设计可以扩展到在单个服务器上处理数十万个套接字连接 - 但在世界桌面方面,可以非常方便地在桌面PC中常见的2或4个核心上扩展作业处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM