将Webclient与Foreach Loop一起使用可下载约100,000个网页

Question

我正在尝试构建一个小型应用程序，在其中输入大约100,000至200,000个url的列表时，应该去下载html并将其保存在相对文件夹中。

我有2个解决方案，但每个问题我都想找出最好的方法。

第一个解决方案：同步方法

下面是我正在使用的代码

currentline = 0;
                var lines = txtUrls.Lines.Where(line => !String.IsNullOrWhiteSpace(line)).Count();
                string urltext = txtUrls.Text;
                List<string> list = new List<string>(
                           txtUrls.Text.Split(new string[] { "\r\n" },
                           StringSplitOptions.RemoveEmptyEntries));

                lblStatus.Text = "Working";
                btnStart.Enabled = false;

                foreach (string url in list)
                {
                    using (WebClient client = new WebClient())
                    {
                        client.DownloadFile(url, @".\pages\page" + currentline + ".html");
                        currentline++;
                    }
                }

                lblStatus.Text = "Finished";
                btnStart.Enabled = true;

该代码可以正常运行，但是速度很慢，并且在5000个网址后也会随机运行，并且停止工作，并且过程表明已完成。 （请注意，我在后台工作程序上使用此代码，但使该代码更易于查看，因此我仅显示相关代码。）

第二种解决方案：异步方法

int currentline = 0;

                string urltext = txtUrls.Text;
                List<string> list = new List<string>(
                           txtUrls.Text.Split(new string[] { "\r\n" },
                           StringSplitOptions.RemoveEmptyEntries));

                foreach (var url in list)
                {
                    using (WebClient webClient = new WebClient())
                    {
                        webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
                        webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
                        webClient.DownloadFileAsync(new Uri(url), @".\pages\page" + currentline + ".html");
                    }

                    currentline++;
                    label1.Text = "No.of Lines Completed: " + currentline;
                }

这段代码的运行速度非常快，但是在大多数情况下，我会以0KB的速度下载文件，并且由于在OVH Dedi服务器中进行测试，因此我确定网络运行速度很快。

谁能指出我做错了什么？ 或改善它的技巧或完全不同的解决方案。

Answer 1

而不是使用DownloadFile（）尝试使用

public  async Task GetData()
{
      WebClient client = new WebClient();
      var data = await client.DownloadDataTaskAsync("http://xxxxxxxxxxxxxxxxxxxxx");
}

您将获得以byte []格式化的数据。 然后，您只需调用： File.WriteAllBytes()将它们保存到磁盘。

将Webclient与Foreach Loop一起使用可下载约100,000个网页

问题描述

1 个解决方案

解决方案1
0 2016-06-19 06:15:56

将Webclient与Foreach Loop一起使用可下载约100,000个网页

问题描述

1 个解决方案

解决方案1 0 2016-06-19 06:15:56

解决方案1
0 2016-06-19 06:15:56