简体   繁体   中英

Parallel.For and httpclient crash the application C#

I want to avoid application crashing problem due to parallel for loop and httpclient but I am unable to apply solutions that are provided elsewhere on the web due to my limited knowledge of programming. My code is pasted below.

class Program
    {
        public static List<string> words = new List<string>();
        public static int count = 0;
        public static string output = "";
        private static HttpClient Client = new HttpClient();
        public static void Main(string[] args)
        {
            //input path strings...
            List<string> links = new List<string>();
            links.AddRange(File.ReadAllLines(input));
            List<string> longList = new List<string>(File.ReadAllLines(@"a.txt"));
            words.AddRange(File.ReadAllLines(output1));
            System.Net.ServicePointManager.DefaultConnectionLimit = 8;
            count = longList.Count;
            //for (int i = 0; i < longList.Count; i++)
            Task.Run(() => Parallel.For(0, longList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, (i, loopState) =>
            {
                Console.WriteLine(i);
                string link = @"some link" + longList[i] + "/";
                try
                {
                    if (!links.Contains(link))
                    {
                        Task.Run(async () => { await Download(link); }).Wait();
                    }
                }
                catch (System.Exception e)
                {

                }
                               }));
            //}

        }
        public static async Task Download(string link)
        {
            HtmlAgilityPack.HtmlDocument document = new HtmlDocument();
            document.LoadHtml(await getURL(link));
            //...stuff with html agility pack
        }
        public static async Task<string> getURL(string link)
        {
            string result = "";
            HttpResponseMessage response = await Client.GetAsync(link);
            Console.WriteLine(response.StatusCode);
            if(response.IsSuccessStatusCode)
            {
                HttpContent content = response.Content;
                var bytes = await response.Content.ReadAsByteArrayAsync();
                result = Encoding.UTF8.GetString(bytes);
            }
            return result;
        }

    }

There are solutions for example this one , but I don't know how to put await keyword in my main method, and currently the program simply exits due to its absence before Task.Run() . As you can see I have already applied a workaround regarding async Download() method to call it in main method. I have also doubts regarding the use of same instance of httpclient in different parallel threads. Please advise me whether I should create new instance of httpclient each time.

You're right that you have to block tasks somewhere in a console application, otherwise the program will just exit before it's complete. But you're doing this more than you need to. Aim for just blocking the main thread and delegating the rest to an async method. A good practice is to create a method with a signature like private async Task MainAsyc(args) , put the "guts" of your program logic there, call it from Main like this:

MainAsync(args).Wait();

In your example, move everything from Main to MainAsync . Then you're free to use await as much as you want. Task.Run and Parallel.For are explicitly consuming new threads for I/O bound work, which is unnecessary in the async world. Use Task.WhenAll instead. The last part of your MainAsync method should end up looking something like this:

await Task.WhenAll(longList.Select(async s => {
    Console.WriteLine(i);
    string link = @"some link" + s + "/";
    try
    {
        if (!links.Contains(link))
        {
            await Download(link);
        }
    }
    catch (System.Exception e)
    {

    }
}));

There is one little wrinkle here though. Your example is throttling the parallelism at 5. If you find you still need this, TPL Dataflow is a great library for throttled parallelism in the async world. Here's a simple example .

Regarding HttpClient, using a single instance across threads is completely safe and highly encouraged .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM