private List<string> retrieveImages(string address)
{
System.Net.WebClient wc = new System.Net.WebClient();
List<string> imgList = new List<string>();
doc.Load(wc.OpenRead(address)); //or whatever HTML file you have
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
if (imgs == null) return new List<string>();
foreach (HtmlNode img in imgs)
{
if (img.Attributes["src"] == null)
continue;
HtmlAttribute src = img.Attributes["src"];
imgList.Add(src.Value);
//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");
}
return imgList;
}
In some case the List imgList contain 33 items and it looks like that:
In the first place [0]
I see: /images/experiments/nav_logo78.png
as a link for the image I don't see and http in the start or www just start with /images
Then in place [1]
I see: //maps.gstatic.com/mapfiles/transparent.png
Then after some items in place [10] I see: http://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali
Im not sure what gali is I don't see .bmp .gif
or .png
images just Gali.
What I want is to download all the this images from each link and save it to my hard disk. So I have this function for the download:
private Image GetImage(string url)
{
System.Net.WebRequest request = System.Net.WebRequest.Create(url);
System.Net.WebResponse response = request.GetResponse();
System.IO.Stream responseStream = response.GetResponseStream();
Bitmap bmp = new Bitmap(responseStream);
responseStream.Dispose();
return bmp;
}
When im using this GetImage function in the retrieveImages() function it dosent do anything the program even dosent owrk I mean the List imgList is empty. If I mark this two lines as it is now:
//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");
If I mark them with // not ot use them then evrything is working but if im using them nothing is working and its not saving anything to my hard disk.
What should I do?
Edit:
I just changed my retrieveImages function to this:
private List<string> retrieveImages(string address)
{
System.Net.WebClient wc = new System.Net.WebClient();
List<string> imgList = new List<string>();
doc.Load(wc.OpenRead(address));
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
if (imgs == null) return new List<string>();
foreach (HtmlNode img in imgs)
{
if (img.Attributes["src"] == null)
continue;
HtmlAttribute src = img.Attributes["src"];
imgList.Add(src.Value);
wc.DownloadFile(src.Value , @"d:\MyImages\my.gif");
}
return imgList;
}
I used a breakpoint on the line wc.DownloadFile and it throw me an exception: Webexception Was Caught
Could not find a part of the path 'D:\\textinputassistant\\tia.png'.
In src.Value it waontin in this case: /textinputassistant/tia.png So you told me to avoid links that have not http or https or www in the start i will fix it. The question is if the exception is since this line start with / and it dosent have any http/s/www ?
The full exception:
System.Net.WebException was caught
Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
Source=System
StackTrace:
at System.Net.WebClient.DownloadFile(Uri address, String fileName)
at System.Net.WebClient.DownloadFile(String address, String fileName)
at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 328
at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
InnerException: System.Net.WebException
Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
Source=System
StackTrace:
at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
at System.Net.FileWebRequest.GetResponseCallback(Object state)
InnerException: System.IO.DirectoryNotFoundException
Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
Source=mscorlib
StackTrace:
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
at System.Net.FileWebStream..ctor(FileWebRequest request, String path, FileMode mode, FileAccess access, FileShare sharing, Int32 length, Boolean async)
at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
Just added a filter so it will save only links that start with http:
private List<string> retrieveImages(string address)
{
System.Net.WebClient wc = new System.Net.WebClient();
List<string> imgList = new List<string>();
doc.Load(wc.OpenRead(address));
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
if (imgs == null) return new List<string>();
foreach (HtmlNode img in imgs)
{
if (img.Attributes["src"] == null)
continue;
HtmlAttribute src = img.Attributes["src"];
imgList.Add(src.Value);
if (src.Value.Contains("http"))
{
wc.DownloadFile(src.Value, @"d:\MyImages\my.gif");
}
}
return imgList;
}
Now src.Value contain: http://mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali
Then after it trying to download im getting exception: WebException Was Caught
The remote server returned an error: (403) Forbidden.
System.Net.WebException was caught
Message=The remote server returned an error: (403) Forbidden.
Source=System
StackTrace:
at System.Net.WebClient.DownloadFile(Uri address, String fileName)
at System.Net.WebClient.DownloadFile(String address, String fileName)
at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 330
at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
InnerException:
The question is if the exception throw up since the site in this case google is blocking downloads or since the link end with Gali wich is im not sure what type of file it is ?
I would first ignore images that don't have a valid link ie no http://
For saving a file right to disk you can download the binary for it and save like so:
string URL="http://www.yourdomain.com/file1.zip";
string DestinationPath="C:\file1.jpg";
System.Net.WebClient Client = new WebClient();
Client.DownloadFile(URL,DestinationPath);
You don't have to convert an image to a .net Image to save it. I have some similiar code in some import apps I wrote recently
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.