简体   繁体   中英

How can I download the images from the site to my hard disk?

I'm trying to download the images instead only extract the time and dates of each image. The code is working but only for the time and dates.

using System;
using System.Linq;
using System.IO;
using System.Xml;
using System.Net;
using HtmlAgilityPack;
                    
public class Program
{
    public static void Main()
    {
        var wc = new WebClient();
        wc.BaseAddress = "https://something.com";
        HtmlDocument doc = new HtmlDocument();
        
        var temp = wc.DownloadData("/en");
        doc.Load(new MemoryStream(temp));       
        
        var secTokenScript = doc.DocumentNode.Descendants()
            .Where(e =>
                   String.Compare(e.Name, "script", true) == 0 &&
                   String.Compare(e.ParentNode.Name, "div", true) == 0 &&
                   e.InnerText.Length > 0 &&
                   e.InnerText.Trim().StartsWith("var region")
                  ).FirstOrDefault().InnerText;
        var securityToken = secTokenScript;
        securityToken = securityToken.Substring(0, securityToken.IndexOf("arrayImageTimes.push"));  
        securityToken = secTokenScript.Substring(securityToken.Length).Replace("arrayImageTimes.push('", "").Replace("')", "");
        var dates = securityToken.Trim().Split(new string[] { ";"}, StringSplitOptions.RemoveEmptyEntries);
        var scriptDates = dates.Select(x => new ScriptDate { DateString = x });
        foreach(var date in scriptDates) 
        {
            Console.WriteLine("Date String: '" + date.DateString + "'\tYear: '" + date.Year + "'\t Month: '" + date.Month + "'\t Day: '" + date.Day + "'\t Hours: '" + date.Hours + "'\t Minutes: '" + date.Minutes + "'");
        }
        
    }
    
    
    public class ScriptDate
    {
        public string DateString {get;set;}
        public int Year 
        {
            get
            {
                return Convert.ToInt32(this.DateString.Substring(0, 4));
            }
        }
        public int Month
        {
            get
            {
                return Convert.ToInt32(this.DateString.Substring(4, 2));
            }
        }
        public int Day
        {
            get
            {
                return Convert.ToInt32(this.DateString.Substring(6, 2));
            }
        }
        public int Hours
        {
            get
            {
                return Convert.ToInt32(this.DateString.Substring(8, 2));
            }
        }
        public int Minutes
        {
            get
            {
                return Convert.ToInt32(this.DateString.Substring(10, 2));
            }
        }                       
    }
}

but how can I use the same code to download and save the images?

Tried this but getting exception:

private void Download()
        {
            using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
            {
                client.DownloadFile("https://something.com", @"C:\temp\localfile.html");
            }
        }

System.Net.WebException: 'The remote server returned an error: (500) Internal Server Error.'

I can get the dates and hours but I can't get the source to extract the links for the images.

Example of link of one of the images how it should be build like:

https://something.com/image?type=infraPolair&region=tu&timestamp=202012150230

but I want to extract automatic the dates and hours from the page for all the images and then to build automatic the links and then to download the images.

The first code I can get the dates and hours of each image but I can't download the source of the page so I can't extract and build the links for the images.

That's why I thought to use the first code somehow also to build the images links and then download the images.

Not exactly sure what are you trying to achieve with the datetimes but with HAP you can do something like this:


HtmlElementCollection elements = doc.DocumentNode.SelectNodes("//img");
foreach (HtmlElement imageElement in elements)
{
   var imageSrc = imageElement.Attributes["src"].Value
   Download(imageSrc);
}

... ...


private void Download(src)
        {
            using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
            {
                client.DownloadFile(src, @"C:\temp\" + uniqueNameForFile + ".jpg");
            }
        }

This is not a perfect answer but should get you going. Src might be relative so you may have to add a base address.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM