简体   繁体   English

使用DownloadFileTaskAsync一次下载所有文件

[英]Use DownloadFileTaskAsync to download all files at once

Given a input text file containing the Urls, I would like to download the corresponding files all at once. 给定包含Urls的输入文本文件,我想一次下载相应的文件。 I use the answer to this question UserState using WebClient and TaskAsync download from Async CTP as reference. 我使用WebClient和TaskAsync从Async CTP下载作为参考, 使用UserState的这个问题的答案。

public void Run()
{
    List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();

    int index = 0;
    Task[] tasks = new Task[urls.Count()];
    foreach (string url in urls)
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
        Task downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Task outputTask = downloadTask.ContinueWith(t => Output(path));
        tasks[index] = outputTask;
    }
    Console.WriteLine("Start now");
    Task.WhenAll(tasks);
    Console.WriteLine("Done");

}

public void Output(string path)
{
    Console.WriteLine(path);
}

I expected that the downloading of the files would begin at the point of "Task.WhenAll(tasks)". 我预计文件的下载将从“Task.WhenAll(tasks)”开始。 But it turns out that the output look likes 但事实证明输出看起来像

c:/temp/Output/image-2.jpg
c:/temp/Output/image-1.jpg
c:/temp/Output/image-4.jpg
c:/temp/Output/image-6.jpg
c:/temp/Output/image-3.jpg
[many lines deleted]
Start now
c:/temp/Output/image-18.jpg
c:/temp/Output/image-19.jpg
c:/temp/Output/image-20.jpg
c:/temp/Output/image-21.jpg
c:/temp/Output/image-23.jpg
[many lines deleted]
Done

Why does the downloading begin before WaitAll is called? 为什么在调用WaitAll之前开始下载? What can I change to achieve what I would like (ie all tasks will begin at the same time)? 我可以改变什么来实现我想要的(即所有任务将同时开始)?

Thanks 谢谢

Why does the downloading begin before WaitAll is called? 为什么在调用WaitAll之前开始下载?

First of all, you're not calling Task.WaitAll , which synchronously blocks, you're calling Task.WhenAll , which returns an awaitable which should be awaited. 首先,你没有调用Task.WaitAll ,它同步阻塞,你正在调用Task.WhenAll ,它返回一个等待的等待。

Now, as others said, when you call an async method, even without using await on it, it fires the asynchronous operation, because any method conforming to the TAP will return a "hot task". 现在,正如其他人所说,当你调用异步方法时,即使不使用await它,它也会触发异步操作,因为符合TAP的任何方法都将返回“热门任务”。

What can I change to achieve what I would like (ie all tasks will begin at the same time)? 我可以改变什么来实现我想要的(即所有任务将同时开始)?

Now, if you want to defer execution until Task.WhenAll , you can use Enumerable.Select to project each element to a Task , and materialize it when you pass it to Task.WhenAll : 现在,如果你想将执行推迟到Task.WhenAll ,你可以使用Enumerable.Select将每个元素投影到一个Task ,并在将它传递给Task.WhenAll时实现它:

public async Task RunAsync()
{
    IEnumerable<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt");

    var urlTasks = urls.Select((url, index) =>
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index);

        var downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Output(path);

        return downloadTask;
    });

    Console.WriteLine("Start now");
    await Task.WhenAll(urlTasks);
    Console.WriteLine("Done");
}

Why does the downloading begin before WaitAll is called? 为什么在调用WaitAll之前开始下载?

Because : 因为

Tasks created by its public constructors are referred to as “cold” tasks, in that they begin their life cycle in the non-scheduled TaskStatus.Created state, and it's not until Start is called on these instances that they progress to being scheduled. 由其公共构造函数创建的任务称为“冷”任务,因为它们在非调度的TaskStatus.Created状态中开始其生命周期,并且直到在这些实例上调用Start才会进行调度。 All other tasks begin their life cycle in a “hot” state, meaning that the asynchronous operations they represent have already been initiated and their TaskStatus is an enumeration value other than Created. 所有其他任务在“热”状态下开始其生命周期,这意味着它们所代表的异步操作已经启动,并且它们的TaskStatus是除Created之外的枚举值。 All tasks returned from TAP methods must be “hot.” 从TAP方法返回的所有任务必须“热”。

Since DownloadFileTaskAsync is a TAP method, it returns "hot" (that is, already started) task. 由于DownloadFileTaskAsync是TAP方法,因此它返回“hot”(即已启动)任务。

What can I change to achieve what I would like (ie all tasks will begin at the same time)? 我可以改变什么来实现我想要的(即所有任务将同时开始)?

I'd look at TPL Data Flow . 我看看TPL数据流 Something like this (I've used HttpClient instead of WebClient , but, actually, it doesn't matter): 像这样的东西(我使用的是HttpClient而不是WebClient ,但实际上,它并不重要):

    static async Task DownloadData(IEnumerable<string> urls)
    {
        // we want to execute this in parallel
        var executionOptions = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };

        // this block will receive URL and download content, pointed by URL
        var donwloadBlock = new TransformBlock<string, Tuple<string, string>>(async url =>
        {
            using (var client = new HttpClient())
            {
                var content = await client.GetStringAsync(url);
                return Tuple.Create(url, content);
            }
        }, executionOptions);

        // this block will print number of bytes downloaded
        var outputBlock = new ActionBlock<Tuple<string, string>>(tuple =>
        {
            Console.WriteLine($"Downloaded {(string.IsNullOrEmpty(tuple.Item2) ? 0 : tuple.Item2.Length)} bytes from {tuple.Item1}");
        }, executionOptions);

        // here we tell to donwloadBlock, that it is linked with outputBlock;
        // this means, that when some item from donwloadBlock is being processed, 
        // it must be posted to outputBlock
        using (donwloadBlock.LinkTo(outputBlock))
        {
            // fill downloadBlock with input data
            foreach (var url in urls)
            {
                await donwloadBlock.SendAsync(url);
            }

            // tell donwloadBlock, that it is complete; thus, it should start processing its items
            donwloadBlock.Complete();
            // wait while downloading data
            await donwloadBlock.Completion;
            // tell outputBlock, that it is completed
            outputBlock.Complete();
            // wait while printing output
            await outputBlock.Completion;
        }
    }

    static void Main(string[] args)
    {
        var urls = new[]
        {
            "http://www.microsoft.com",
            "http://www.google.com",
            "http://stackoverflow.com",
            "http://www.amazon.com",
            "http://www.asp.net"
        };

        Console.WriteLine("Start now.");
        DownloadData(urls).Wait();
        Console.WriteLine("Done.");

        Console.ReadLine();
    }

Output: 输出:

Start now. 现在开始。
Downloaded 1020 bytes from http://www.microsoft.com http://www.microsoft.com下载了1020个字节
Downloaded 53108 bytes from http://www.google.com http://www.google.com下载了53108个字节
Downloaded 244143 bytes from http://stackoverflow.com http://stackoverflow.com下载244143个字节
Downloaded 468922 bytes from http://www.amazon.com http://www.amazon.com下载了468922个字节
Downloaded 27771 bytes from http://www.asp.net http://www.asp.net下载了27771个字节
Done. 完成。

What can I change to achieve what I would like (ie all tasks will begin at the same time)? 我可以改变什么来实现我想要的(即所有任务将同时开始)?

To synchronize the beginning of the download you could use Barrier class. 要同步下载的开头,您可以使用Barrier类。

  public void Run()
  {
      List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();


      Barrier barrier = new Barrier(url.Count, ()=> {Console.WriteLine("Start now");} );

      Task[] tasks = new Task[urls.Count()];

      Parallel.For(0, urls.Count, (int index)=>
      {
           string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
          tasks[index] = DownloadAsync(Uri(urls[index]), path, barrier);        
      })


      Task.WaitAll(tasks); // wait for completion
      Console.WriteLine("Done");
    }

    async Task DownloadAsync(Uri url, string path, Barrier barrier)
    {
           using (WebClient wc = new WebClient())
           {
                barrier.SignalAndWait();
                await wc.DownloadFileAsync(url, path);
                Output(path);
           }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM