简体   繁体   中英

Better approach in management of multiple WebRequest

I have an component that is processing multiple web requests each in separate thread. Each WebRequest processing is synchronous.

public class WebRequestProcessor:System.ComponentModel.Component
{
    List<Worker> tlist = new List<Worker>();
    public void Start()
    {
        foreach(string url in urlList){
            // Create the thread object. This does not start the thread.
            Worker workerObject = new Worker();
            Thread workerThread = new Thread(workerObject.DoWork);

            // Start the worker thread.
            workerThread.Start(url);
            tlist.Add(workerThread);
        }
    }
}

public class Worker
{
    // This method will be called when the thread is started.
    public void DoWork(string url)
    {
        // prepare the web page we will be asking for
        HttpWebRequest  request  = (HttpWebRequest) 
            WebRequest.Create(url);

        // execute the request
        HttpWebResponse response = (HttpWebResponse)
            request.GetResponse();

        // we will read data via the response stream
        Stream resStream = response.GetResponseStream();

        // process stream
    }
}

Now I have to find optimal way how to cancel all requests.

One way is to convert each synchronous WebRequest into async and use WebRequest.Abort to cancel processing.

Another way is to release thread pointers and allow all threads to die using GC.

If you want to download 1000 files, starting 1000 threads at once is certainly not the best option. Not only it probably won't get you any speedup when compared with downloading just a few files at a time, it will also require at least 1 GB of virtual memory. Creating threads is expensive, try to avoid doing so in a loop.

What you should do instead is to use Parallel.ForEach() along with the asynchronous versions of the request and response operations. For example like this (WPF code):

private void Start_Click(object sender, RoutedEventArgs e)
{
    m_tokenSource = new CancellationTokenSource();
    var urls = …;
    Task.Factory.StartNew(() => Start(urls, m_tokenSource.Token), m_tokenSource.Token);
}

private void Cancel_Click(object sender, RoutedEventArgs e)
{
    m_tokenSource.Cancel();
}

void Start(IEnumerable<string> urlList, CancellationToken token)
{
    Parallel.ForEach(urlList, new ParallelOptions { CancellationToken = token },
                     url => DownloadOne(url, token));

}

void DownloadOne(string url, CancellationToken token)
{
    ReportStart(url);

    try
    {
        var request = WebRequest.Create(url);

        var asyncResult = request.BeginGetResponse(null, null);

        WaitHandle.WaitAny(new[] { asyncResult.AsyncWaitHandle, token.WaitHandle });

        if (token.IsCancellationRequested)
        {
            request.Abort();
            return;
        }

        var response = request.EndGetResponse(asyncResult);

        using (var stream = response.GetResponseStream())
        {
            byte[] bytes = new byte[4096];

            while (true)
            {
                asyncResult = stream.BeginRead(bytes, 0, bytes.Length, null, null);

                WaitHandle.WaitAny(new[] { asyncResult.AsyncWaitHandle,
                                           token.WaitHandle });

                if (token.IsCancellationRequested)
                    break;

                var read = stream.EndRead(asyncResult);

                if (read == 0)
                    break;

                // do something with the downloaded bytes
            }
        }

        response.Close();
    }
    finally
    {
        ReportFinish(url);
    }
}

This way, when you cancel the operation, all downloads are canceled and no new ones are started. Also, you probably want to set MaxDegreeOfParallelism of ParallelOptions , so that you aren't doing too many downloads at once.

I'm not sure what do you want to do with the files you are downloading, so using StreamReader might be a better option.

I think the best solution is " Parallel Foreach Cancellation ". Please check the following code.

  1. To implement a cancellation, you first make CancellationTokenSource and pass it to Parallel.ForEach through option .
  2. If you want to cancel, you can call CancellationTokenSource.Cancel()
  3. After the cancelling, OperationCanceledException will be occurred, which you need to handle.

There is a good article about Parallel Programming related to my answer, which is Task Parallel Library By Sacha Barber on CodeProject .

CancellationTokenSource tokenSource = new CancellationTokenSource();
ParallelOptions options = new ParallelOptions()
{
    CancellationToken = tokenSource.Token
};

List<string> urlList = null;
//parallel foreach cancellation
try
{
    ParallelLoopResult result = Parallel.ForEach(urlList, options, (url) =>
    {
        // Create the thread object. This does not start the thread.
        Worker workerObject = new Worker();
        workerObject.DoWork(url);
    });
}
catch (OperationCanceledException ex)
{
    Console.WriteLine("Operation Cancelled");
}

UPDATED

The following code is "Parallel Foreach Cancellation Sample Code".

class Program
{
    static void Main(string[] args)
    {
        List<int> data = ParallelEnumerable.Range(1, 10000).ToList();

        CancellationTokenSource tokenSource = new CancellationTokenSource();

        Task cancelTask = Task.Factory.StartNew(() =>
            {
                Thread.Sleep(1000);
                tokenSource.Cancel();
            });


        ParallelOptions options = new ParallelOptions()
        {
            CancellationToken = tokenSource.Token
        };


        //parallel foreach cancellation
        try
        {
            Parallel.ForEach(data,options, (x, state) =>
            {
                Console.WriteLine(x);
                Thread.Sleep(100);
            });
        }
        catch (OperationCanceledException ex)
        {
            Console.WriteLine("Operation Cancelled");
        }


        Console.ReadLine();
    }
}

Another approach is to use thread abort, check Implement C# Generic Timeout and consider spawning an AppDomain that you kill as mention by Mark Gravel

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM