簡體   English   中英

如何從列表中下載所有圖像 <string> 有很多圖片鏈接?

[英]How do i download all images from a List<string> with many links for images inside?

private List<string> retrieveImages(string address)
{

    System.Net.WebClient wc = new System.Net.WebClient();
    List<string> imgList = new List<string>();
    doc.Load(wc.OpenRead(address)); //or whatever HTML file you have 
    HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
    if (imgs == null) return new List<string>();

    foreach (HtmlNode img in imgs)
    {
        if (img.Attributes["src"] == null)
            continue;
        HtmlAttribute src = img.Attributes["src"];

        imgList.Add(src.Value);
        //Image imgDownload = GetImage(src.Value);
        //imgDownload.Save(@"d:\myImages");
    }
    return imgList;
}

在某些情況下,List imgList包含33個項目,看起來像這樣:

首先是[0]我看到: /images/experiments/nav_logo78.png作為我看不到的圖像的鏈接,而開頭或www開頭的http是以/images開頭

然后到位[1]我看到: //maps.gstatic.com/mapfiles/transparent.png

然后,在放置了一些項目[10]之后,我看到了: http : //mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

我不確定是什么加利,我只看到Gali看不到.bmp .gif.png圖像。

我想要的是從每個鏈接下載所有這些圖像並將其保存到我的硬盤上。 所以我有這個功能可以下載:

private Image GetImage(string url)
{
    System.Net.WebRequest request = System.Net.WebRequest.Create(url);

    System.Net.WebResponse response = request.GetResponse();
    System.IO.Stream responseStream = response.GetResponseStream();

    Bitmap bmp = new Bitmap(responseStream);

    responseStream.Dispose();

    return bmp;
} 

當我在retrieveImages()函數中使用此GetImage函數時,它甚至不執行任何程序,甚至不執行任何操作,我的意思是List imgList為空。 如果我將這兩行標記為現在:

//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");

如果我將它們標記為//不可以使用它們,那么evrything可以正常工作,但是如果im使用它們則無法正常工作,並且不會將任何內容保存到我的硬盤上。

我該怎么辦?

編輯:

我只是將我的retrieveImages函數更改為:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address)); 
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                wc.DownloadFile(src.Value ,  @"d:\MyImages\my.gif");
            }
            return imgList;
        }

我在wc.DownloadFile行上使用了一個斷點,這使我拋出異常:Webexception被捕獲

找不到路徑“ D:\\ textinputassistant \\ tia.png”的一部分。

在src.Value中,它在這種情況下會消失:/textinputassistant/tia.png因此,您告訴我避免一開始沒有http或https或www的鏈接,我將對其進行修復。 問題是是否例外,因為此行以/開頭,並且沒有任何http / s / www嗎?

完整的例外:

System.Net.WebException was caught
  Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 328
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: System.Net.WebException
       Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
       Source=System
       StackTrace:
            at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
            at System.Net.FileWebRequest.GetResponseCallback(Object state)
       InnerException: System.IO.DirectoryNotFoundException
            Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
            Source=mscorlib
            StackTrace:
                 at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
                 at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath)
                 at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
                 at System.Net.FileWebStream..ctor(FileWebRequest request, String path, FileMode mode, FileAccess access, FileShare sharing, Int32 length, Boolean async)
                 at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)

InnerException:

剛剛添加了一個過濾器,因此它將僅保存以http開頭的鏈接:

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address));
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                if (src.Value.Contains("http"))
                {
                    wc.DownloadFile(src.Value, @"d:\MyImages\my.gif");
                }
            }
            return imgList;
        }

現在src.Value包含: http ://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

然后嘗試下載即時消息后出現異常:WebException被捕獲

遠程服務器返回錯誤:(403)禁止。

System.Net.WebException was caught
  Message=The remote server returned an error: (403) Forbidden.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 330
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: 

問題是是否由於該網站在這種情況下google阻止了下載而引發了異常,或者由於以Gali wich結尾的鏈接不確定該文件是哪種類型?

我首先會忽略沒有有效鏈接的圖像,即沒有http://

為了將文件保存到磁盤,您可以為其下載二進制文件並進行保存,如下所示:

string URL="http://www.yourdomain.com/file1.zip";
string DestinationPath="C:\file1.jpg";
System.Net.WebClient Client = new WebClient();
Client.DownloadFile(URL,DestinationPath);

您無需將圖像轉換為.net圖像即可保存。 我最近寫的一些導入應用程序中有一些類似的代碼

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM