繁体   English   中英

为什么从网站下载图像时出现异常:WebClient请求期间发生异常?

[英]Why when downloading images from a website im getting exception: An exception occurred during a WebClient request?

编码:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using HtmlAgilityPack;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Net;
using System.Web;
using System.Threading;
using DannyGeneral;
using GatherLinks;

namespace GatherLinks
{
    class RetrieveWebContent
    {
        HtmlAgilityPack.HtmlDocument doc;
        string imgg;
        int images;

        public RetrieveWebContent()
        {
            images = 0;
        }

        public List<string> retrieveImages(string address)
        {
            try
            {
                doc = new HtmlAgilityPack.HtmlDocument();
                System.Net.WebClient wc = new System.Net.WebClient();
                List<string> imgList = new List<string>();
                doc.Load(wc.OpenRead(address));
                HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
                if (imgs == null) return new List<string>();

                foreach (HtmlNode img in imgs)
                {
                    if (img.Attributes["src"] == null)
                        continue;
                    HtmlAttribute src = img.Attributes["src"];

                    imgList.Add(src.Value);
                    if (src.Value.StartsWith("http") || src.Value.StartsWith("https") || src.Value.StartsWith("www"))
                    {
                        images++;
                        string[] arr = src.Value.Split('/');
                        imgg = arr[arr.Length - 1];
                        wc.DownloadFile(src.Value, @"d:\MyImages\" + imgg);
                    }
                }

                return imgList;
            }
            catch
            {
                Logger.Write("There Was Problem Downloading The Image: " + imgg);
                return null;

            }
        }
    }
}

例如,给出该异常的链接:

http://vanessawest.tripod.com/bundybowman.jpg

经过几次迭代后,它进入了foreach循环,跳转到了陷阱。

现在,例如,如果链接来自另一个站点:

www.walla.co.il

因此,它进入foreach循环并获取所有图像没有问题。

这是该链接的完整异常消息:

http://vanessawest.tripod.com/bundybowman.jpg

System.Net.WebException was caught
  HResult=-2146233079
  Message=An exception occurred during a WebClient request.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.RetrieveWebContent.retrieveImages(String address) in d:\C-Sharp\GatherLinks\GatherLinks-2\GatherLinks\GatherLinks\RetrieveWebContent.cs:line 55
  InnerException: System.ArgumentException
       HResult=-2147024809
       Message=Illegal characters in path.
       Source=mscorlib
       StackTrace:
            at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional)
            at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
            at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
            at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       InnerException:

我不知道为什么在Walla上将其工作正常,而在三脚架上将其作为例外。

http://vanessawest.tripod.com/bundybowman.jpg

您确定那是引起问题的原因吗? 我的猜测是,由于使用字符串拆分逻辑来获取文件名,您在本地文件名中得到了一些不需要的字符。

string[] arr = src.Value.Split('/');
imgg = arr[arr.Length - 1];

相反,请尝试以下操作:

imgg = Path.GetFileName(new Uri(src.Value).LocalPath);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM