简体   繁体   English

如何从列表中下载所有图像 <string> 有很多图片链接?

[英]How do i download all images from a List<string> with many links for images inside?

private List<string> retrieveImages(string address)
{

    System.Net.WebClient wc = new System.Net.WebClient();
    List<string> imgList = new List<string>();
    doc.Load(wc.OpenRead(address)); //or whatever HTML file you have 
    HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
    if (imgs == null) return new List<string>();

    foreach (HtmlNode img in imgs)
    {
        if (img.Attributes["src"] == null)
            continue;
        HtmlAttribute src = img.Attributes["src"];

        imgList.Add(src.Value);
        //Image imgDownload = GetImage(src.Value);
        //imgDownload.Save(@"d:\myImages");
    }
    return imgList;
}

In some case the List imgList contain 33 items and it looks like that: 在某些情况下,List imgList包含33个项目,看起来像这样:

In the first place [0] I see: /images/experiments/nav_logo78.png as a link for the image I don't see and http in the start or www just start with /images 首先是[0]我看到: /images/experiments/nav_logo78.png作为我看不到的图像的链接,而开头或www开头的http是以/images开头

Then in place [1] I see: //maps.gstatic.com/mapfiles/transparent.png 然后到位[1]我看到: //maps.gstatic.com/mapfiles/transparent.png

Then after some items in place [10] I see: http://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali 然后,在放置了一些项目[10]之后,我看到了: http : //mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

Im not sure what gali is I don't see .bmp .gif or .png images just Gali. 我不确定是什么加利,我只看到Gali看不到.bmp .gif.png图像。

What I want is to download all the this images from each link and save it to my hard disk. 我想要的是从每个链接下载所有这些图像并将其保存到我的硬盘上。 So I have this function for the download: 所以我有这个功能可以下载:

private Image GetImage(string url)
{
    System.Net.WebRequest request = System.Net.WebRequest.Create(url);

    System.Net.WebResponse response = request.GetResponse();
    System.IO.Stream responseStream = response.GetResponseStream();

    Bitmap bmp = new Bitmap(responseStream);

    responseStream.Dispose();

    return bmp;
} 

When im using this GetImage function in the retrieveImages() function it dosent do anything the program even dosent owrk I mean the List imgList is empty. 当我在retrieveImages()函数中使用此GetImage函数时,它甚至不执行任何程序,甚至不执行任何操作,我的意思是List imgList为空。 If I mark this two lines as it is now: 如果我将这两行标记为现在:

//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");

If I mark them with // not ot use them then evrything is working but if im using them nothing is working and its not saving anything to my hard disk. 如果我将它们标记为//不可以使用它们,那么evrything可以正常工作,但是如果im使用它们则无法正常工作,并且不会将任何内容保存到我的硬盘上。

What should I do? 我该怎么办?

Edit: 编辑:

I just changed my retrieveImages function to this: 我只是将我的retrieveImages函数更改为:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address)); 
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                wc.DownloadFile(src.Value ,  @"d:\MyImages\my.gif");
            }
            return imgList;
        }

I used a breakpoint on the line wc.DownloadFile and it throw me an exception: Webexception Was Caught 我在wc.DownloadFile行上使用了一个断点,这使我抛出异常:Webexception被捕获

Could not find a part of the path 'D:\\textinputassistant\\tia.png'. 找不到路径“ D:\\ textinputassistant \\ tia.png”的一部分。

In src.Value it waontin in this case: /textinputassistant/tia.png So you told me to avoid links that have not http or https or www in the start i will fix it. 在src.Value中,它在这种情况下会消失:/textinputassistant/tia.png因此,您告诉我避免一开始没有http或https或www的链接,我将对其进行修复。 The question is if the exception is since this line start with / and it dosent have any http/s/www ? 问题是是否例外,因为此行以/开头,并且没有任何http / s / www吗?

The full exception: 完整的例外:

System.Net.WebException was caught
  Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 328
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: System.Net.WebException
       Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
       Source=System
       StackTrace:
            at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
            at System.Net.FileWebRequest.GetResponseCallback(Object state)
       InnerException: System.IO.DirectoryNotFoundException
            Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
            Source=mscorlib
            StackTrace:
                 at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
                 at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath)
                 at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
                 at System.Net.FileWebStream..ctor(FileWebRequest request, String path, FileMode mode, FileAccess access, FileShare sharing, Int32 length, Boolean async)
                 at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)

InnerException: InnerException:

Just added a filter so it will save only links that start with http: 刚刚添加了一个过滤器,因此它将仅保存以http开头的链接:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address));
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                if (src.Value.Contains("http"))
                {
                    wc.DownloadFile(src.Value, @"d:\MyImages\my.gif");
                }
            }
            return imgList;
        }

Now src.Value contain: http://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali 现在src.Value包含: http ://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

Then after it trying to download im getting exception: WebException Was Caught 然后尝试下载即时消息后出现异常:WebException被捕获

The remote server returned an error: (403) Forbidden. 远程服务器返回错误:(403)禁止。

System.Net.WebException was caught
  Message=The remote server returned an error: (403) Forbidden.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 330
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: 

The question is if the exception throw up since the site in this case google is blocking downloads or since the link end with Gali wich is im not sure what type of file it is ? 问题是是否由于该网站在这种情况下google阻止了下载而引发了异常,或者由于以Gali wich结尾的链接不确定该文件是哪种类型?

I would first ignore images that don't have a valid link ie no http:// 我首先会忽略没有有效链接的图像,即没有http://

For saving a file right to disk you can download the binary for it and save like so: 为了将文件保存到磁盘,您可以为其下载二进制文件并进行保存,如下所示:

string URL="http://www.yourdomain.com/file1.zip";
string DestinationPath="C:\file1.jpg";
System.Net.WebClient Client = new WebClient();
Client.DownloadFile(URL,DestinationPath);

You don't have to convert an image to a .net Image to save it. 您无需将图像转换为.net图像即可保存。 I have some similiar code in some import apps I wrote recently 我最近写的一些导入应用程序中有一些类似的代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM