繁体   English   中英

如何从列表中下载所有图像 <string> 有很多图片链接?

[英]How do i download all images from a List<string> with many links for images inside?

private List<string> retrieveImages(string address)
{

    System.Net.WebClient wc = new System.Net.WebClient();
    List<string> imgList = new List<string>();
    doc.Load(wc.OpenRead(address)); //or whatever HTML file you have 
    HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
    if (imgs == null) return new List<string>();

    foreach (HtmlNode img in imgs)
    {
        if (img.Attributes["src"] == null)
            continue;
        HtmlAttribute src = img.Attributes["src"];

        imgList.Add(src.Value);
        //Image imgDownload = GetImage(src.Value);
        //imgDownload.Save(@"d:\myImages");
    }
    return imgList;
}

在某些情况下,List imgList包含33个项目,看起来像这样:

首先是[0]我看到: /images/experiments/nav_logo78.png作为我看不到的图像的链接,而开头或www开头的http是以/images开头

然后到位[1]我看到: //maps.gstatic.com/mapfiles/transparent.png

然后,在放置了一些项目[10]之后,我看到了: http : //mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

我不确定是什么加利,我只看到Gali看不到.bmp .gif.png图像。

我想要的是从每个链接下载所有这些图像并将其保存到我的硬盘上。 所以我有这个功能可以下载:

private Image GetImage(string url)
{
    System.Net.WebRequest request = System.Net.WebRequest.Create(url);

    System.Net.WebResponse response = request.GetResponse();
    System.IO.Stream responseStream = response.GetResponseStream();

    Bitmap bmp = new Bitmap(responseStream);

    responseStream.Dispose();

    return bmp;
} 

当我在retrieveImages()函数中使用此GetImage函数时,它甚至不执行任何程序,甚至不执行任何操作,我的意思是List imgList为空。 如果我将这两行标记为现在:

//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");

如果我将它们标记为//不可以使用它们,那么evrything可以正常工作,但是如果im使用它们则无法正常工作,并且不会将任何内容保存到我的硬盘上。

我该怎么办?

编辑:

我只是将我的retrieveImages函数更改为:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address)); 
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                wc.DownloadFile(src.Value ,  @"d:\MyImages\my.gif");
            }
            return imgList;
        }

我在wc.DownloadFile行上使用了一个断点,这使我抛出异常:Webexception被捕获

找不到路径“ D:\\ textinputassistant \\ tia.png”的一部分。

在src.Value中,它在这种情况下会消失:/textinputassistant/tia.png因此,您告诉我避免一开始没有http或https或www的链接,我将对其进行修复。 问题是是否例外,因为此行以/开头,并且没有任何http / s / www吗?

完整的例外:

System.Net.WebException was caught
  Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 328
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: System.Net.WebException
       Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
       Source=System
       StackTrace:
            at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
            at System.Net.FileWebRequest.GetResponseCallback(Object state)
       InnerException: System.IO.DirectoryNotFoundException
            Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
            Source=mscorlib
            StackTrace:
                 at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
                 at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath)
                 at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
                 at System.Net.FileWebStream..ctor(FileWebRequest request, String path, FileMode mode, FileAccess access, FileShare sharing, Int32 length, Boolean async)
                 at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)

InnerException:

刚刚添加了一个过滤器,因此它将仅保存以http开头的链接:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address));
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                if (src.Value.Contains("http"))
                {
                    wc.DownloadFile(src.Value, @"d:\MyImages\my.gif");
                }
            }
            return imgList;
        }

现在src.Value包含: http ://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

然后尝试下载即时消息后出现异常:WebException被捕获

远程服务器返回错误:(403)禁止。

System.Net.WebException was caught
  Message=The remote server returned an error: (403) Forbidden.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 330
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: 

问题是是否由于该网站在这种情况下google阻止了下载而引发了异常,或者由于以Gali wich结尾的链接不确定该文件是哪种类型?

我首先会忽略没有有效链接的图像,即没有http://

为了将文件保存到磁盘,您可以为其下载二进制文件并进行保存,如下所示:

string URL="http://www.yourdomain.com/file1.zip";
string DestinationPath="C:\file1.jpg";
System.Net.WebClient Client = new WebClient();
Client.DownloadFile(URL,DestinationPath);

您无需将图像转换为.net图像即可保存。 我最近写的一些导入应用程序中有一些类似的代码

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM