简体   繁体   English

Parallel.ForEach无法在长时间运行的IEnumerable上执行消息

[英]Parallel.ForEach fails to execute messages on long running IEnumerable

Why will the Parallel.ForEach will not finish executing a series of tasks until MoveNext returns false? 为什么直到MoveNext返回false时Parallel.ForEach才能完成一系列任务?

I have a tool that monitors a combination of MSMQ and Service Broker queues for incoming messages. 我有一个工具可以监视MSMQ和Service Broker队列的组合,以接收传入消息。 When a message is found, it hands that message off to the appropriate executor. 找到一条消息后,它将把该消息交给适当的执行者。

I wrapped the check for messages in an IEnumerable, so that I could hand the Parallel.ForEach method the IEnumerable plus a delegate to run. 我将对消息的检查包装在IEnumerable中,以便可以将Parallel.ForEach方法传递给IEnumerable以及要运行的委托。 The application is designed to run continuously w/ the IEnumerator.MoveNext processing in a loop until it's able to get work, then the IEnumerator.Current giving it the next item. 该应用程序旨在通过IEnumerator.MoveNext处理在循环中连续运行,直到能够正常工作为止,然后是IEnumerator.Current给它下一个项目。

Since the MoveNext will never die until I set the CancelToken to true, this should continue to process for ever. 由于MoveNext直到我将CancelToken设置为true时才会消失,因此这应该永远继续进行。 Instead what I'm seeing is that once the Parallel.ForEach has picked up all the messages and the MoveNext is no longer returning "true", no more tasks are processed. 相反,我所看到的是,一旦Parallel.ForEach拾取了所有消息并且MoveNext不再返回“ true”,就不再处理更多任务。 Instead it seems like the MoveNext thread is the only thread given any work while it waits for it to return, and the other threads (including waiting and scheduled threads) do not do any work. 相反,似乎MoveNext线程是等待返回的唯一工作,而其他线程(包括等待和调度的线程)则不执行任何工作。

  • Is there a way to tell the Parallel to keep working while it waits for a response from the MoveNext? 有没有办法告诉Parallel在等待MoveNext的响应时继续工作?
  • If not, is there another way to structure the MoveNext to get what I want? 如果没有,是否有另一种方法来构造MoveNext以获得我想要的? (having it return true and then the Current returning a null object spawns a lot of bogus Tasks) (让它返回true,然后Current返回一个null对象会产生很多虚假的Tasks)
  • Bonus Question: Is there a way to limit how many messages the Parallel pulls off at once? 奖励问题:有没有办法限制并行并行发送多少条消息? It seems to pull off and schedule a lot of messages at once (the MaxDegreeOfParallelism only seems to limit how much work it does at once, it doesn't stop it from pulling off a lot of messages to be scheduled) 似乎可以一次完成并调度大量消息(MaxDegreeOfParallelism似乎仅限制了一次完成的工作量,它并没有阻止它完成许多待调度的消息)

Here is the IEnumerator for what I've written (w/o some extraneous code): 这是我所写内容的IEnumerator(不带任何多余的代码):

public class DataAccessEnumerator : IEnumerator<TransportMessage> 
{
    public TransportMessage Current
    {   get { return _currentMessage; } }

    public bool MoveNext()
    {
        while (_cancelToken.IsCancellationRequested == false)
        {
            TransportMessage current;
            foreach (var task in _tasks)
            {
                if (task.QueueType.ToUpper() == "MSMQ")
                    current = _msmq.Get(task.Name);
                else
                    current = _serviceBroker.Get(task.Name);

                if (current != null)
                {
                    _currentMessage = current;
                    return true;
                }
            }
            WaitHandle.WaitAny(new [] {_cancelToken.WaitHandle}, 500); 
        }

        return false; 
    }

    public DataAccessEnumerator(IDataAccess<TransportMessage> serviceBroker, IDataAccess<TransportMessage> msmq, IList<JobTask> tasks, CancellationToken cancelToken)
    {
        _serviceBroker = serviceBroker;
        _msmq = msmq;
        _tasks = tasks;
        _cancelToken = cancelToken;
    }

    private readonly IDataAccess<TransportMessage> _serviceBroker;
    private readonly IDataAccess<TransportMessage> _msmq;
    private readonly IList<JobTask> _tasks;
    private readonly CancellationToken _cancelToken;
    private TransportMessage _currentMessage;
}

Here is the Parallel.ForEach call where _queueAccess is the IEnumerable that holds the above IEnumerator and RunJob processes a TransportMessage that is returned from that IEnumerator: 这是Parallel.ForEach调用,其中_queueAccess是保存上述IEnumerator的IEnumerable,而RunJob处理从该IEnumerator返回的TransportMessage:

var parallelOptions = new ParallelOptions
    {
        CancellationToken = _cancelTokenSource.Token,
        MaxDegreeOfParallelism = 8 
    };

Parallel.ForEach(_queueAccess, parallelOptions, x => RunJob(x));

It sounds to me like Parallel.ForEach isn't really a good match for what you want to do. 在我看来,这听起来像Parallel.ForEach并不是您想要做的事情的理想选择。 I suggest you use BlockingCollection<T> to create a producer/consumer queue instead - create a bunch of threads/tasks to service the blocking collection, and add work items to it as and when they arrive. 我建议您改为使用BlockingCollection<T>来创建生产者/消费者队列-创建一堆线程/任务来服务于阻塞集合,并在到达时向其添加工作项。

Your problem might be to do with the Partitioner being used. 您的问题可能与正在使用的分区程序有关。

In your case, the TPL will choose the Chunk Partitioner, which will take multiple items from the enum before passing them on to be processed. 在您的情况下,TPL将选择“块分区程序”,该程序将从枚举中提取多个项目,然后再将它们传递给处理。 The number of items taken in each chunk will increase with time. 每个块中占用的项目数将随时间增加。

When your MoveNext method blocks, the TPL is left waiting for the next item and won't process the items that it has already taken. 当您的MoveNext方法阻塞时,TPL会等待下一个项目,并且不会处理它已经采取的项目。

You have a couple of options to fix this: 您可以通过以下几种方法解决此问题:

1) Write a Partitioner that always returns individual items. 1)写一个总是返回单个项目的分区程序。 Not as tricky as it sounds. 听起来不那么棘手。

2) Use the TPL instead of Parallel.ForEach : 2)使用TPL代替Parallel.ForEach

foreach ( var item in _queueAccess )
{
    var capturedItem = item;

    Task.Factory.StartNew( () => RunJob( capturedItem ) );
}

The second solution changes the behaviour a bit. 第二种解决方案稍微改变了行为。 The foreach loop will complete when all the Tasks have been created, not when they have finished. foreach循环将在创建所有Tasks时完成,而不是在完成时完成。 If this is a problem for you, you can add a CountdownEvent : 如果您遇到问题,则可以添加CountdownEvent

var ce = new CountdownEvent( 1 );

foreach ( var item in _queueAccess )
{
    ce.AddCount();

    var capturedItem = item;

    Task.Factory.StartNew( () => { RunJob( capturedItem ); ce.Signal(); } );
}

ce.Signal();
ce.Wait();

I haven't gone to the effort to make sure of this, but the impression I'd received from discussions of Parallel.ForEach was that it would pull all the items out of the enumerable them make appropriate decisions about how to divide them across threads. 我并没有努力去确保这一点,但是我从Parallel.ForEach的讨论中得到的印象是,它将把所有项目从难以枚举的项目中抽出,并做出适当的决定,以决定如何在线程之间划分它们。 Based on your problem, that seems correct. 根据您的问题,这似乎是正确的。

So, to keep most of your current code, you should probably pull the blocking code out of the iterator and place it into a loop around the call to Parallel.ForEach (which uses the iterator). 因此,要保留大多数当前代码,您可能应该将阻塞代码从迭代器中拉出,并将其放入对Parallel.ForEach(使用迭代器)的调用的循环中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM