简体   繁体   中英

Parallel.ForEach fails to execute messages on long running IEnumerable

Why will the Parallel.ForEach will not finish executing a series of tasks until MoveNext returns false?

I have a tool that monitors a combination of MSMQ and Service Broker queues for incoming messages. When a message is found, it hands that message off to the appropriate executor.

I wrapped the check for messages in an IEnumerable, so that I could hand the Parallel.ForEach method the IEnumerable plus a delegate to run. The application is designed to run continuously w/ the IEnumerator.MoveNext processing in a loop until it's able to get work, then the IEnumerator.Current giving it the next item.

Since the MoveNext will never die until I set the CancelToken to true, this should continue to process for ever. Instead what I'm seeing is that once the Parallel.ForEach has picked up all the messages and the MoveNext is no longer returning "true", no more tasks are processed. Instead it seems like the MoveNext thread is the only thread given any work while it waits for it to return, and the other threads (including waiting and scheduled threads) do not do any work.

  • Is there a way to tell the Parallel to keep working while it waits for a response from the MoveNext?
  • If not, is there another way to structure the MoveNext to get what I want? (having it return true and then the Current returning a null object spawns a lot of bogus Tasks)
  • Bonus Question: Is there a way to limit how many messages the Parallel pulls off at once? It seems to pull off and schedule a lot of messages at once (the MaxDegreeOfParallelism only seems to limit how much work it does at once, it doesn't stop it from pulling off a lot of messages to be scheduled)

Here is the IEnumerator for what I've written (w/o some extraneous code):

public class DataAccessEnumerator : IEnumerator<TransportMessage> 
{
    public TransportMessage Current
    {   get { return _currentMessage; } }

    public bool MoveNext()
    {
        while (_cancelToken.IsCancellationRequested == false)
        {
            TransportMessage current;
            foreach (var task in _tasks)
            {
                if (task.QueueType.ToUpper() == "MSMQ")
                    current = _msmq.Get(task.Name);
                else
                    current = _serviceBroker.Get(task.Name);

                if (current != null)
                {
                    _currentMessage = current;
                    return true;
                }
            }
            WaitHandle.WaitAny(new [] {_cancelToken.WaitHandle}, 500); 
        }

        return false; 
    }

    public DataAccessEnumerator(IDataAccess<TransportMessage> serviceBroker, IDataAccess<TransportMessage> msmq, IList<JobTask> tasks, CancellationToken cancelToken)
    {
        _serviceBroker = serviceBroker;
        _msmq = msmq;
        _tasks = tasks;
        _cancelToken = cancelToken;
    }

    private readonly IDataAccess<TransportMessage> _serviceBroker;
    private readonly IDataAccess<TransportMessage> _msmq;
    private readonly IList<JobTask> _tasks;
    private readonly CancellationToken _cancelToken;
    private TransportMessage _currentMessage;
}

Here is the Parallel.ForEach call where _queueAccess is the IEnumerable that holds the above IEnumerator and RunJob processes a TransportMessage that is returned from that IEnumerator:

var parallelOptions = new ParallelOptions
    {
        CancellationToken = _cancelTokenSource.Token,
        MaxDegreeOfParallelism = 8 
    };

Parallel.ForEach(_queueAccess, parallelOptions, x => RunJob(x));

It sounds to me like Parallel.ForEach isn't really a good match for what you want to do. I suggest you use BlockingCollection<T> to create a producer/consumer queue instead - create a bunch of threads/tasks to service the blocking collection, and add work items to it as and when they arrive.

Your problem might be to do with the Partitioner being used.

In your case, the TPL will choose the Chunk Partitioner, which will take multiple items from the enum before passing them on to be processed. The number of items taken in each chunk will increase with time.

When your MoveNext method blocks, the TPL is left waiting for the next item and won't process the items that it has already taken.

You have a couple of options to fix this:

1) Write a Partitioner that always returns individual items. Not as tricky as it sounds.

2) Use the TPL instead of Parallel.ForEach :

foreach ( var item in _queueAccess )
{
    var capturedItem = item;

    Task.Factory.StartNew( () => RunJob( capturedItem ) );
}

The second solution changes the behaviour a bit. The foreach loop will complete when all the Tasks have been created, not when they have finished. If this is a problem for you, you can add a CountdownEvent :

var ce = new CountdownEvent( 1 );

foreach ( var item in _queueAccess )
{
    ce.AddCount();

    var capturedItem = item;

    Task.Factory.StartNew( () => { RunJob( capturedItem ); ce.Signal(); } );
}

ce.Signal();
ce.Wait();

I haven't gone to the effort to make sure of this, but the impression I'd received from discussions of Parallel.ForEach was that it would pull all the items out of the enumerable them make appropriate decisions about how to divide them across threads. Based on your problem, that seems correct.

So, to keep most of your current code, you should probably pull the blocking code out of the iterator and place it into a loop around the call to Parallel.ForEach (which uses the iterator).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM