简体   繁体   中英

How to do multi-threading with asynchronous webrequests

I'm trying to implement .NET 4 helper/utility class which should retrieve HTML page sources based on the url list for webtesting tool. The solution should be scalable and have high performance.

I have been researching and trying different solutions already many days, but cannot find out proper solution.

Based on my understanding best way to achieve my goal would be to use asynchronous webrequests running parallel using TPL.

In order to have full control to headers etc. I'm using HttpWebResponse instead of WebClient which is wrapping HttpWebResponse. In some cases the output should be chained to other tasks thus using TPL tasks could make sense.

What I have achieved so far after many different trials/approaches,

  1. Implemented basic synchronous, asynchronous (APM) and parallel (using TPL tasks) solutions to see performance level of different solutions.

  2. To see the performance of asynchrounous parallel solution I used APM approach, BeginGetResponse and BeginRead, and run it in Parallel.ForEach. Everything works fine and I'm happy with the performance. Somehow I feel that using simple Parallel.ForEach is not the way to go and for example I don't know how would I use task chaining.

  3. Then I tried more sophisticated system using tasks for wrapping the APM solution by using TaskCompletionSource and iterator to iterate through the APM flow. I believe that this solution could be what I'm looking for, but there is a strange delay, something between 6-10s, which happens 2-3 times when running 500 urls list.

    Based on the logs the execution has went back to the thread which is calling async fetch in a loop when the delay happens. The delay doesn't happen always when execution moves back to the loop, just 2-3 times, other times it works fine. It looks like that the looping thread would create a set of tasks those would be processed by other threads and while most/all tasks are completed there would be delay (6-8s) before the loop continues creating remaining tasks and other threads are active again.

The principle of iterator inside loop is:

IEnumerable<Task> DoExample(string input) 
    { 
    var aResult = DoAAsync(input); 
    yield return aResult; 
    var bResult = DoBAsync(aResult.Result); 
    yield return bResult; 
    var cResult = DoCAsync(bResult.Result); 
    yield return cResult; 
    … 
    }

Task t = Iterate(DoExample(“42”));

I'm resolving the connection limit by using System.Net.ServicePointManager.DefaultConnectionLimit and timeout using ThreadPool.RegisterWaitForSingleObject

My question is simply, what would be the best approach to implement helper/utility class for retrieving html pages which would:

  • be scalable and have high performance
  • use webrequests
  • be easily chained to other tasks
  • be able to use timeout
  • use .NET 4 framework

If you think that the solution of using APM, TaskCompletionSource and iterator, which I presented above, is fine I would appreciate any help for trying to solve the delay problem.

I'm totally new to C# and Windows development so please don't mind if something what I'm trying out doesn't make too much sense.

Any help would be highly appreciated as without getting this solved I have to drop my test tool development.

Thanks

Using iterators was a great solution in the pre-TPL .NET (eg, the Coordination and Concurrency Runtime (CCR) out of MS Robotics made heavy use of them and helped inspire TPL). One problem is that iterators alone aren't going to give you what you need - you also need a scheduler to effectively distribute the workload. That's almost done by Stephen Toub's snippet that you linked to - but note that one line:

enumerator.Current.ContinueWith(recursiveBody, TaskContinuationOptions.ExecuteSynchronously);

I think the intermittent problems you're seeing might be linked to forcing "ExecuteSynchronously" - it could be causing an uneven distribution of work across the available cores/threads.

Take a look at some of the other alternatives that Stephen proposes in his blog article . In particular, see what just doing a simple chaining of ContinueWith() calls will do (if necessary, followed by matching Unwrap() calls). The syntax won't be the prettiest, but it's the simplest and interferes as little as possible with the underlying work-stealing runtime, so you'll hopefully get better results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM