Completed Event Handler for Task.Factory.StartNew(() => Parallel.ForEach

Question

I want to know when some parallel tasks are completed.

I'm using this code to make between 1500 and 2000 small WebClient.DownloadString with a 10 seconds HttpRequest Timeout on a website:

Task.Factory.StartNew(() => 
    Parallel.ForEach<string>(myKeywords, new ParallelOptions 
    { MaxDegreeOfParallelism = 5 }, getKey));

Sometimes, a query fails, so that there are exceptions and the function never finish, and the UI refresh inside each getKey function sometimes seems to be called twice, so I cannot get an accurate idea about how many tasks are completed. I'm calculating: Number of UI refresh calls / total number of keywords, and get a result between 100% and 250%, and I never know when task are completed. I search in a lot of SO discussion but none was a direct method or a method that suits my needs. So I guess Framework 4.0 doesn't provides any Tasks.AllCompleted Event Handler or similar workaround?

Should I run my Parallel.Foreach in one other thread instead of my UI thread then add it?

myTasks.WaitAll

[EDIT]

A temporary solution was to copy my list of string in a ArrayList, then removing one by one each item from the list at the beginning of each query. Whenever the function worked well or not, I know when all items have been processed.

Answer 1

Parallel.ForEach is no different than other loops when it comes to handling exceptions . If an exception is thrown, then it is going to stop processing of the loop. This is probably why you're seeing variances in the percentages (I assume you might be processing the count as you're processing the loop).

Also, you don't really need Parallel.ForEach becuase the asynchronous calls that you're making on the WebClient class are going to block waiting on IO completion (the network responses), they are not computationally bound ( Parallel.ForEach is much better when you are computationally bound).

That said, you should first translate your calls to WebClient to use Task<TResult> . Translating the event-based asynchronous pattern to the task-based asynchronous pattern is simple with the use of the TaskCompletionSource<TResult> class .

Assuming that you have a sequence of Uri instances that are produced as a result of your calls to getKey , you can create a function to do this:

static Task<String> DownloadStringAsync(Uri uri)
{
    // Create a WebClient
    var wc = new WebClient();

    // Set up your web client.

    // Create the TaskCompletionSource.
    var tcs = new TaskCompletionSource<string>();

    // Set the event handler on the web client.
    wc.DownloadStringCompleted += (s, e) => {
        // Dispose of the WebClient when done.
        using (wc)
        {
            // Set the task completion source based on the
            // event.
            if (e.Cancelled)
            {
                // Set cancellation.
                tcs.SetCancelled();
                return;
            }

            // Exception?
            if (e.Error != null)
            { 
                // Set exception.
                tcs.SetException(e.Error);
                return;
            }

            // Set result.
            tcs.SetResult(e.Result);
        };

    // Return the task.
    return tcs.Task;
};

Note, the above can be optimized to use one WebClient , that is left as an exercise for you (assuming your tests show you need it).

From there, you can get a sequence of Task<string> :

// Gotten from myKeywords
IEnumerable<Uri> uris = ...;

// The tasks.
Task<string>[] tasks = uris.Select(DownloadStringAsync).ToArray();

Note that you must call the ToArray extension method in order for the tasks to start running. This is to get around deferred execution . You don't have to call ToArray , but you must call something which will enumerate through the entire list and cause the tasks to start running.

Once you have these Task<string> instances, you can wait on them all to complete by calling the ContinueWhenAll<TAntecedentResult> method on the TaskFactory class , like so:

Task.Factory.ContinueWhenAll(tasks, a => { }).Wait();

When this is done, you can cycle through the tasks array and look at the Exception and/or Result properties to check to see what the exception or result was.

If you are updating a user interface, then you should look at intercepting the call to Enumerable.Select , namely, you should call the ContinueWith<TNewResult> method on the Task<TResult> to perform an operation when that download is complete , like so:

// The tasks.
Task<string>[] tasks = uris.
    Select(DownloadStringAsync).
    // Select receives a Task<T> here, continue that.
    Select(t => t.ContinueWith(t2 => {
        // Do something here: 
        //   - increment a count
        //   - fire an event
        //   - update the UI
        // Note that you have to take care of synchronization here, so
        // make sure to synchronize access to a count, or serialize calls
        // to the UI thread appropriately with a SynchronizationContext.
        ...

        // Return the result, this ensures that you'll have a Task<string>
        // waiting.
        return t2;
    })).
    ToArray();

This will allow you to update things as they happen. Note that in the above case, if you call Select again, you might want to check the state of t2 and fire some other events, depending on what you want your error handling mechanism to be.

Completed Event Handler for Task.Factory.StartNew(() => Parallel.ForEach

Question

1 answers

solution1
1 ACCPTED 2012-10-09 20:29:26

Completed Event Handler for Task.Factory.StartNew(() => Parallel.ForEach

Question

1 answers

solution1 1 ACCPTED 2012-10-09 20:29:26

solution1
1 ACCPTED 2012-10-09 20:29:26